RE: [f2fs-dev] [PATCH RESEND] f2fs: modify the procedure of scan free nid

From: Fan Li
Date: Mon Nov 06 2017 - 02:10:59 EST




> -----Original Message-----
> From: Chao Yu [mailto:chao@xxxxxxxxxx]
> Sent: Friday, November 03, 2017 9:16 PM
> To: Fan Li; 'Chao Yu'; 'Jaegeuk Kim'
> Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-f2fs-devel@xxxxxxxxxxxxxxxxxxxxx
> Subject: Re: [f2fs-dev] [PATCH RESEND] f2fs: modify the procedure of scan free nid
>
> On 2017/11/3 18:29, Fan Li wrote:
> >
> >
> >> -----Original Message-----
> >> From: Chao Yu [mailto:yuchao0@xxxxxxxxxx]
> >> Sent: Friday, November 03, 2017 4:54 PM
> >> To: Fan Li; 'Chao Yu'; 'Jaegeuk Kim'
> >> Cc: linux-kernel@xxxxxxxxxxxxxxx;
> >> linux-f2fs-devel@xxxxxxxxxxxxxxxxxxxxx
> >> Subject: Re: [f2fs-dev] [PATCH RESEND] f2fs: modify the procedure of
> >> scan free nid
> >>
> >> On 2017/11/3 15:31, Fan Li wrote:
> >>> In current version, we preserve 8 pages of nat blocks as free nids,
> >>> we build bitmaps for it and use them to allocate nids until its
> >>> number drops below NAT_ENTRY_PER_BLOCK.
> >>>
> >>> After that, we have a problem, scan_free_nid_bits will scan the same
> >>> 8 pages trying to find more free nids, but in most cases the free
> >>> nids in these bitmaps are already in free list, scan them won't get
> >>> us any new nids.
> >>> Further more, after scan_free_nid_bits, the scan is over if
> >>> nid_cnt[FREE_NID] != 0.
> >>> It causes that we scan the same pages over and over again, and no
> >>> new free nids are found until nid_cnt[FREE_NID]==0. While the
> >>> scanned pages increase, the problem grows worse.
> >>>
> >>> This patch mark the range where new free nids could exist and keep
> >>> scan for free nids until nid_cnt[FREE_NID] >= NAT_ENTRY_PER_BLOCK.
> >>> The new vairable first_scan_block marks the start of the range, it's
> >>> initialized with NEW_ADDR, which means all free nids before
> >>> next_scan_nid are already in free list; and use next_scan_nid as the
> >>> end of the range since all free nids which are scanned in
> >>> scan_free_nid_bits must be smaller next_scan_nid.
> >>
> >> Think over again, IMO, we can add an variable for stating total count
> >> of free nids in bitamp, if there is no free nid, just
> > skipping scanning all
> >> existed bitmap.
> >>
> >> And if there is only few free nid scattered in bitmap, the cost will
> >> be limited because we will skip scanning
> > nm_i::free_nid_bitmap if
> >> nm_i::free_nid_count is zero. Once we find one free nid, let's skip out.
> >>
> >> Since there shouldn't be very heavy overhead for CPU during traveling
> >> nm_i::nat_block_bitmap, I expect below change could be more simple for maintaining and being with the same effect.
> >>
> >> How do you think?
> >>
> >
> > I think if you need this to work, check total_bitmap_free_nid may not be sufficient enough.
> > The problem this patch presents is that even all the free nids are
> > already in the free list, we still scan all the pages.
> > The scan proceeds once free nid count is below NAT_ENTRY_PER_BLOCK.
> > So in most cases, there are still free nids in the bitmap during the
> > scan, and current codes will check every one of them to see if they are actually in free list.
> > If only check total_bitmap_free_nid == 0 won't take this overhead away.
>
> Oh, you could see that, we have added free_nid_count in each NAT block's free_nid_bitmap bitmap, before scan the bitmap, we will
make
> sure there is at least one free nid.
>
> scan_free_nid_bits()
> for (i = 0; i < nm_i->nat_blocks; i++) {
> if (!test_bit_le(i, nm_i->nat_block_bitmap))
> continue;
> if (!nm_i->free_nid_count[i])
> continue;
Do you mean free_nid_count here?
I thought free_nid_count only represents how many nats are marked free in bitmap of one block.

To my understanding, even a nat is already in the free list, it will still have a bit marked as free in
free_nid_bitmap and a count in free_nid_count.
That means if free_nid_count != 0, and there are marked bits in the bitmap, the free nats in this
block could still be all in the free list.
The purpose of scan is to find new nats and add them to free list, go through the nats which are
already in the free list isn't what we want.
And in xfstest, under most cases scan_free_nid_bits runs, all free nats are indeed in the free list.

> for (idx = 0; idx < NAT_ENTRY_PER_BLOCK; idx++) {
> nid_t nid;
>
> if (!test_bit_le(idx, nm_i->free_nid_bitmap[i]))
> continue;
>
> nid = i * NAT_ENTRY_PER_BLOCK + idx;
> add_free_nid(sbi, nid, true);
>
> if (nm_i->nid_cnt[FREE_NID] >= MAX_FREE_NIDS)
> goto out;
> }
> }
>
> And In that diff, we have changed the exiting condition, once we have grabbed one free nid, stop building.
>
> >> - if (nm_i->nid_cnt[FREE_NID] >= MAX_FREE_NIDS)
> >> + if (nm_i->nid_cnt[FREE_NID])
> >> goto out;
>
>
> So with that simple change, only overhead here is we need to travel nat_block_bitmap all the time when total_bitmap_free_nid is
nonzero,
> but I think that would not be an critical issue here.
>
> >
> > I considered a lot of ways to fix this problem before I submit this
> > patch, One of my idea is quite similar to yours, but I use "if
> > (total_bitmap_free_nid == nm_i->nid_cnt[FREE_NID])" to decide whether
> > skip or not.
>
> Hmm.. can we confirm that if there is no free nid in all bitmap, we can skip the unneeded scanning? Anyway, I think you can write
a patch to
> fix that first?
> More like that diff.
>
> > If you insist, I can submit this simpler one instead, but some follow
> > upgrade would be unavailable, for example, use smaller granularity for
> > tracking last-scanned-position that we talked about.> I know sometimes
> > I can be obsessed with the performance, I usually choose the faster
> > way over simpler ones. If you think it's too much, please tell me, I'm
> > sure we can find some middle ground.
>
> Yup, I think that's why you're the expert of algorithm, I have no doubt about that. :)
>
> IMO, instead of reducing cpu overhead without simple change, I prefer the one can reducing IO, e.g. if NAT block contains maximum
count
> free nids, we can load these nids first, after they were been allocated, in checkpoint, we can write these nat entries into one
NAT block. On
> the contrary, if we load free nids with same count from different NAT blocks, in checkpoint, maybe we will write them into more
NAT blocks.
>
> Thanks,
>
> >
> > Thank you
> >
> >
> >> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index
> >> cb3f10bc8723..238d95e89dec 100644
> >> --- a/fs/f2fs/f2fs.h
> >> +++ b/fs/f2fs/f2fs.h
> >> @@ -729,6 +729,7 @@ struct f2fs_nm_info {
> >> unsigned char (*free_nid_bitmap)[NAT_ENTRY_BITMAP_SIZE];
> >> unsigned char *nat_block_bitmap;
> >> unsigned short *free_nid_count; /* free nid count of NAT block */
> >> + unsigned int total_bitmap_free_nid; /* total free nid count in bitmap */
> >>
> >> /* for checkpoint */
> >> char *nat_bitmap; /* NAT bitmap pointer */
> >> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index
> >> fef5c68886b1..e4861908a396 100644
> >> --- a/fs/f2fs/node.c
> >> +++ b/fs/f2fs/node.c
> >> @@ -1911,10 +1911,13 @@ static void update_free_nid_bitmap(struct f2fs_sb_info *sbi, nid_t nid,
> >> else
> >> __clear_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
> >>
> >> - if (set)
> >> + if (set) {
> >> nm_i->free_nid_count[nat_ofs]++;
> >> - else if (!build)
> >> + nm_i->total_bitmap_free_nid++;
> >> + } else if (!build) {
> >> nm_i->free_nid_count[nat_ofs]--;
> >> + nm_i->total_bitmap_free_nid--;
> >> + }
> >> }
> >>
> >> static void scan_nat_page(struct f2fs_sb_info *sbi, @@ -1958,6
> >> +1961,9 @@ static void scan_free_nid_bits(struct f2fs_sb_info
> > *sbi)
> >>
> >> down_read(&nm_i->nat_tree_lock);
> >>
> >> + if (!nm_i->total_bitmap_free_nid)
> >> + goto out;
> >> +
> >> for (i = 0; i < nm_i->nat_blocks; i++) {
> >> if (!test_bit_le(i, nm_i->nat_block_bitmap))
> >> continue;
> >> @@ -1972,7 +1978,7 @@ static void scan_free_nid_bits(struct f2fs_sb_info *sbi)
> >> nid = i * NAT_ENTRY_PER_BLOCK + idx;
> >> add_free_nid(sbi, nid, true);
> >>
> >> - if (nm_i->nid_cnt[FREE_NID] >= MAX_FREE_NIDS)
> >> + if (nm_i->nid_cnt[FREE_NID])
> >> goto out;
> >> }
> >> }
> >>
> >> Thanks,
> >>
> >>>
> >>> Signed-off-by: Fan li <fanofcode.li@xxxxxxxxxxx>
> >>> ---
> >>> fs/f2fs/f2fs.h | 1 +
> >>> fs/f2fs/node.c | 42 +++++++++++++++++++++++++++++++++++-------
> >>> 2 files changed, 36 insertions(+), 7 deletions(-)
> >>>
> >>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index e0ef31c..ae1cf91
> >>> 100644
> >>> --- a/fs/f2fs/f2fs.h
> >>> +++ b/fs/f2fs/f2fs.h
> >>> @@ -705,6 +705,7 @@ struct f2fs_nm_info {
> >>> nid_t max_nid; /* maximum possible node ids */
> >>> nid_t available_nids; /* # of available node ids */
> >>> nid_t next_scan_nid; /* the next nid to be scanned */
> >>> + block_t first_scan_block; /* the first NAT block to be scanned */
> >>> unsigned int ram_thresh; /* control the memory footprint */
> >>> unsigned int ra_nid_pages; /* # of nid pages to be readaheaded */
> >>> unsigned int dirty_nats_ratio; /* control dirty nats ratio threshold */
> >>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index 3d0d1be..f921e0c
> >>> 100644
> >>> --- a/fs/f2fs/node.c
> >>> +++ b/fs/f2fs/node.c
> >>> @@ -1812,7 +1812,7 @@ static bool add_free_nid(struct f2fs_sb_info *sbi, nid_t nid, bool build)
> >>> struct f2fs_nm_info *nm_i = NM_I(sbi);
> >>> struct free_nid *i, *e;
> >>> struct nat_entry *ne;
> >>> - int err = -EINVAL;
> >>> + int need_free = 1;
> >>> bool ret = false;
> >>>
> >>> /* 0 nid should not be used */
> >>> @@ -1863,13 +1863,25 @@ static bool add_free_nid(struct f2fs_sb_info *sbi, nid_t nid, bool build)
> >>> }
> >>> }
> >>> ret = true;
> >>> - err = __insert_free_nid(sbi, i, FREE_NID);
> >>> + need_free = __insert_free_nid(sbi, i, FREE_NID);
> >>> err_out:
> >>> spin_unlock(&nm_i->nid_list_lock);
> >>> radix_tree_preload_end();
> >>> err:
> >>> - if (err)
> >>> + if (need_free)
> >>> kmem_cache_free(free_nid_slab, i);
> >>> + /*
> >>> + * For nid that should be free but not in the free
> >>> + * structure, update the scan range in hope of adding
> >>> + * it in the next scan.
> >>> + */
> >>> + if (!ret || need_free < 0) {
> >>> + block_t tmp_block = NAT_BLOCK_OFFSET(nid);
> >>> +
> >>> + if (tmp_block < nm_i->first_scan_block)
> >>> + nm_i->first_scan_block = tmp_block;
> >>> + }
> >>> +
> >>> return ret;
> >>> }
> >>>
> >>> @@ -1950,10 +1962,17 @@ static void scan_free_nid_bits(struct f2fs_sb_info *sbi)
> >>> struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_HOT_DATA);
> >>> struct f2fs_journal *journal = curseg->journal;
> >>> unsigned int i, idx;
> >>> + unsigned int max_blocks = NAT_BLOCK_OFFSET(nm_i->next_scan_nid);
> >>>
> >>> - down_read(&nm_i->nat_tree_lock);
> >>> + /* every free nid in blocks scanned previously is in the free list */
> >>> + if (nm_i->first_scan_block == NEW_ADDR)
> >>> + return;
> >>>
> >>> - for (i = 0; i < nm_i->nat_blocks; i++) {
> >>> + if (max_blocks == 0)
> >>> + max_blocks = nm_i->nat_blocks;
> >>> +
> >>> + down_read(&nm_i->nat_tree_lock);
> >>> + for (i = nm_i->first_scan_block; i < max_blocks; i++) {
> >>> if (!test_bit_le(i, nm_i->nat_block_bitmap))
> >>> continue;
> >>> if (!nm_i->free_nid_count[i])
> >>> @@ -1967,10 +1986,13 @@ static void scan_free_nid_bits(struct f2fs_sb_info *sbi)
> >>> nid = i * NAT_ENTRY_PER_BLOCK + idx;
> >>> add_free_nid(sbi, nid, true);
> >>>
> >>> - if (nm_i->nid_cnt[FREE_NID] >= MAX_FREE_NIDS)
> >>> + if (nm_i->nid_cnt[FREE_NID] >= MAX_FREE_NIDS) {
> >>> + nm_i->first_scan_block = i;
> >>> goto out;
> >>> + }
> >>> }
> >>> }
> >>> + nm_i->first_scan_block = NEW_ADDR;
> >>> out:
> >>> down_read(&curseg->journal_rwsem);
> >>> for (i = 0; i < nats_in_cursum(journal); i++) { @@ -2010,7 +2032,7
> >>> @@ static void __build_free_nids(struct f2fs_sb_info *sbi, bool sync, bool mount)
> >>> /* try to find free nids in free_nid_bitmap */
> >>> scan_free_nid_bits(sbi);
> >>>
> >>> - if (nm_i->nid_cnt[FREE_NID])
> >>> + if (nm_i->nid_cnt[FREE_NID] >= NAT_ENTRY_PER_BLOCK)
> >>> return;
> >>> }
> >>>
> >>> @@ -2163,6 +2185,7 @@ int try_to_free_nids(struct f2fs_sb_info *sbi, int nr_shrink)
> >>> struct f2fs_nm_info *nm_i = NM_I(sbi);
> >>> struct free_nid *i, *next;
> >>> int nr = nr_shrink;
> >>> + nid_t min_nid = nm_i->max_nid;
> >>>
> >>> if (nm_i->nid_cnt[FREE_NID] <= MAX_FREE_NIDS)
> >>> return 0;
> >>> @@ -2176,11 +2199,15 @@ int try_to_free_nids(struct f2fs_sb_info *sbi, int nr_shrink)
> >>> nm_i->nid_cnt[FREE_NID] <= MAX_FREE_NIDS)
> >>> break;
> >>>
> >>> + if (i->nid < min_nid)
> >>> + min_nid = i->nid;
> >>> __remove_free_nid(sbi, i, FREE_NID);
> >>> kmem_cache_free(free_nid_slab, i);
> >>> nr_shrink--;
> >>> }
> >>> spin_unlock(&nm_i->nid_list_lock);
> >>> + if (min_nid != nm_i->max_nid)
> >>> + nm_i->first_scan_block = NAT_BLOCK_OFFSET(min_nid);
> >>> mutex_unlock(&nm_i->build_lock);
> >>>
> >>> return nr - nr_shrink;
> >>> @@ -2674,6 +2701,7 @@ static int init_node_manager(struct f2fs_sb_info *sbi)
> >>> init_rwsem(&nm_i->nat_tree_lock);
> >>>
> >>> nm_i->next_scan_nid = le32_to_cpu(sbi->ckpt->next_free_nid);
> >>> + nm_i->first_scan_block = NEW_ADDR;
> >>> nm_i->bitmap_size = __bitmap_size(sbi, NAT_BITMAP);
> >>> version_bitmap = __bitmap_ptr(sbi, NAT_BITMAP);
> >>> if (!version_bitmap)
> >>>
> >>
> >
> >