Re: [RFC PATCH 0/2] apply write hints to select the type of segments

From: Jaegeuk Kim
Date: Mon Nov 13 2017 - 23:20:38 EST


On 11/13, Hyunchul Lee wrote:
> On 11/13/2017 10:59 AM, Chao Yu wrote:
> > On 2017/11/13 9:35, Hyunchul Lee wrote:
> >> On 11/13/2017 10:26 AM, Chao Yu wrote:
> >>> On 2017/11/13 8:24, Hyunchul Lee wrote:
> >>>> On 11/10/2017 03:42 PM, Chao Yu wrote:
> >>>>> On 2017/11/10 8:23, Hyunchul Lee wrote:
> >>>>>> Hello, Chao
> >>>>>>
> >>>>>> On 11/09/2017 06:12 PM, Chao Yu wrote:
> >>>>>>> On 2017/11/9 13:51, Hyunchul Lee wrote:
> >>>>>>>> From: Hyunchul Lee <cheol.lee@xxxxxxx>
> >>>>>>>>
> >>>>>>>> Using write hints[1], applications can inform the life time of the data
> >>>>>>>> written to devices. and this[2] reported that the write hints patch
> >>>>>>>> decreased writes in NAND by 25%.
> >>>>>>>>
> >>>>>>>> This hints help F2FS to determine the followings.
> >>>>>>>> 1) the segment types where the data will be written.
> >>>>>>>> 2) the hints that will be passed down to devices with the data of segments.
> >>>>>>>>
> >>>>>>>> This patch set implements the first mapping from write hints to segment types
> >>>>>>>> as shown below.
> >>>>>>>>
> >>>>>>>> hints segment type
> >>>>>>>> ----- ------------
> >>>>>>>> WRITE_LIFE_SHORT CURSEG_COLD_DATA
> >>>>>>>> WRITE_LIFE_EXTREME CURSEG_HOT_DATA
> >>>>>>>> others CURSEG_WARM_DATA
> >>>>>>>>
> >>>>>>>> The F2FS poliy for hot/cold seperation has precedence over this hints, And
> >>>>>>>> hints are not applied in in-place update.
> >>>>>>>
> >>>>>>> Could we change to disable IPU if file/inode write hint is existing?
> >>>>>>>
> >>>>>>
> >>>>>> I am afraid that this makes side effects. for example, this could cause
> >>>>>> out-of-place updates even when there are not enough free segments.
> >>>>>> I can write the patch that handles these situations. But I wonder
> >>>>>> that this is required, and I am not sure which IPU polices can be disabled.
> >>>>>
> >>>>> Oh, As I replied in another thread, I think IPU just affects filesystem
> >>>>> hot/cold separating, rather than this feature. So I think it will be okay
> >>>>> to not consider it.
> >>>>>
> >>>>>>
> >>>>>>>>
> >>>>>>>> Before the second mapping is implemented, write hints are not passed down
> >>>>>>>> to devices. Because it is better that the data of a segment have the same
> >>>>>>>> hint.
> >>>>>>>>
> >>>>>>>> [1]: c75b1d9421f80f4143e389d2d50ddfc8a28c8c35
> >>>>>>>> [2]: https://lwn.net/Articles/726477/
> >>>>>>>
> >>>>>>> Could you write a patch to support passing write hint to block layer for
> >>>>>>> buffered writes as below commit:
> >>>>>>> 0127251c45ae ("ext4: add support for passing in write hints for buffered writes")
> >>>>>>>
> >>>>>>
> >>>>>> Sure I will. I wrote it already ;)
> >>>>>
> >>>>> Cool, ;)
> >>>>>
> >>>>>> I think that datas from the same segment should be passed down with the same
> >>>>>> hint, and the following mapping is reasonable. I wonder what is your opinion
> >>>>>> about it.
> >>>>>>
> >>>>>> segment type hints
> >>>>>> ------------ -----
> >>>>>> CURSEG_COLD_DATA WRITE_LIFE_EXTREME
> >>>>>> CURSEG_HOT_DATA WRITE_LIFE_SHORT
> >>>>>> CURSEG_COLD_NODE WRITE_LIFE_NORMAL
> >>>>>
> >>>>> We have WRITE_LIFE_LONG defined rather than WRITE_LIFE_NORMAL in fs.h?
> >>>>>
> >>>>>> CURSEG_HOT_NODE WRITE_LIFE_MEDIUM
> >>>>>
> >>>>> As I know, in scenario of cell phone, data of meta_inode is hottest, then hot
> >>>>> data, warm node, and cold node should be coldest. So I suggested we can define
> >>>>> as below:
> >>>>>
> >>>>> META_DATA WRITE_LIFE_SHORT
> >>>>> HOT_DATA & WARM_NODE WRITE_LIFE_MEDIUM
> >>>>> HOT_NODE & WARM_DATA WRITE_LIFE_LONG
> >>>>> COLD_NODE & COLD_DATA WRITE_LIFE_EXTREME
> >>>>>
> >>>>
> >>>> I agree, But I am not sure that assigning the same hint to a node and data
> >>>> segment is good. Because NVMe is likely to write them in the same erase
> >>>> block if they have the same hint.
> >>>
> >>> If we do not give the hint, they can still be written to the same erase block,
> >
> > I mean it's possible to write them to the same erase block. :)
> >
> >>> right? it will not be worse?
> >>>
> >>
> >> If the hint is not given, I think that they could be written to
> >> the same erase block, or not. But if we give the same hint, they are written
> >> to the same block.
> >
> > IMO, Only if underlying device can support more hint type or opened channels,
> > and actual temperature of data segment and node segment is quite different, we
> > can separate them.
> >
>
> Okay, If Jaegeuk Kim agrees with this, I will submit the patch that
> implements your proposed mapping.

How about this? We'd better to split data and node blocks as much as possible.

segment type hints
------------ -----
COLD_NODE & COLD_DATA WRITE_LIFE_NONE
WARM_DATA WRITE_LIFE_EXTERME
HOT_NODE & WARM_NODE WRITE_LIFE_LONG
HOT_DATA WRITE_LIFE_MEDIUM
META_DATA WRITE_LIFE_SHORT

>
> Thank you for comments ;)
>
> > Thanks,
> >
> >> I am not sure ;)
> >>
> >>> Thanks,
> >>>
> >>>>
> >>>> Thanks.
> >>>>
> >>>>> Thanks,
> >>>>>
> >>>>>> others WRITE_LIFE_NONE
> >>>>>>
> >>>>>>> Thanks,
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Hyunchul Lee (2):
> >>>>>>>> f2fs: apply write hints to select the type of segments for buffered
> >>>>>>>> write
> >>>>>>>> f2fs: apply write hints to select the type of segment for direct write
> >>>>>>>>
> >>>>>>>> fs/f2fs/data.c | 101 ++++++++++++++++++++++++++++++++----------------------
> >>>>>>>> fs/f2fs/f2fs.h | 1 +
> >>>>>>>> fs/f2fs/segment.c | 14 +++++++-
> >>>>>>>> 3 files changed, 74 insertions(+), 42 deletions(-)
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>> Thanks
> >>>>>>
> >>>>>> .
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>> .
> >>>>
> >>>
> >>>
> >>
> >> .
> >>
> >
> >