Re: [PATCH -next v6] mm: vmscan: retry folios written back while isolated for traditional LRU

From: Yu Zhao
Date: Tue Dec 24 2024 - 14:28:52 EST


On Mon, Dec 23, 2024 at 11:45 PM Chen Ridong <chenridong@xxxxxxxxxxxxxxx> wrote:
>
>
>
> On 2024/12/24 12:19, Yu Zhao wrote:
> > On Mon, Dec 23, 2024 at 1:30 AM Chen Ridong <chenridong@xxxxxxxxxxxxxxx> wrote:
> >>
> >> From: Chen Ridong <chenridong@xxxxxxxxxx>
> >>
> >> The page reclaim isolates a batch of folios from the tail of one of the
> >> LRU lists and works on those folios one by one. For a suitable
> >> swap-backed folio, if the swap device is async, it queues that folio for
> >> writeback. After the page reclaim finishes an entire batch, it puts back
> >> the folios it queued for writeback to the head of the original LRU list.
> >>
> >> In the meantime, the page writeback flushes the queued folios also by
> >> batches. Its batching logic is independent from that of the page reclaim.
> >> For each of the folios it writes back, the page writeback calls
> >> folio_rotate_reclaimable() which tries to rotate a folio to the tail.
> >>
> >> folio_rotate_reclaimable() only works for a folio after the page reclaim
> >> has put it back. If an async swap device is fast enough, the page
> >> writeback can finish with that folio while the page reclaim is still
> >> working on the rest of the batch containing it. In this case, that folio
> >> will remain at the head and the page reclaim will not retry it before
> >> reaching there.
> >
> > For starters, copying & pasting others' commit messages as your own is
> > plagiarism. You need to quote them.
>
> Hi, Yu, Thank you for reminding me, I did not mean any plagiarism. I am
> a beginner, and I do not know much about that.

I didn't think you were intentional but please also understand this is
a very basic rule that applies everywhere, not just this case.

> I wrote the message in my v1 and v2 to describe what issue I was fixing,
> which is wordy. What you wrote is much clearer in the commit
> 359a5e1416ca, so I pasted it. I am sorry. Should resend a new patch to
> modify the message?
>
> I have sent all patch versions to you, and I don't know whether you have
> noticed. How I wish you could point it out at the first I pasted it, so
> I wouldn't have made this mistake again and again.

Here is an example of how I quoted from another commit message:
https://lore.kernel.org/20241107202033.2721681-5-yuzhao@xxxxxxxxxx/

> >> The commit 359a5e1416ca ("mm: multi-gen LRU: retry folios written back
> >> while isolated") only fixed the issue for mglru. However, this issue
> >> also exists in the traditional active/inactive LRU.
> >
> > You need to prove it with some numbers.
>
> Do you mean I should prove it with some info in my message? Actually, I
> encountered this in the traditional active/inactive LRU, I did not know
> you had fixed for mglru until Barry told me. I offered how to reproduce
> this with Link.

Yes, the first link does describe how to repro, but you also need to
show the results after your patch, i.e., the improvements from the
change being reviewed.

> >> This issue will be
> >> worse if THP is split, which makes the list longer and needs longer time
> >> to finish a batch of folios reclaim.
> >>
> >> This issue should be fixed in the same way for the traditional LRU.
> >> Therefore, the common logic was extracted to the 'find_folios_written_back'
> >> function firstly, which is then reused in the 'shrink_inactive_list'
> >> function. Finally, retry reclaiming those folios that may have missed the
> >> rotation for traditional LRU.
> >>
> >> Link: https://lore.kernel.org/linux-kernel/20241010081802.290893-1-chenridong@xxxxxxxxxxxxxxx/
> >> Link: https://lore.kernel.org/linux-kernel/CAGsJ_4zqL8ZHNRZ44o_CC69kE7DBVXvbZfvmQxMGiFqRxqHQdA@xxxxxxxxxxxxxx/
> >> Signed-off-by: Chen Ridong <chenridong@xxxxxxxxxx>
> >> Reviewed-by: Barry Song <baohua@xxxxxxxxxx>
> >> ---
> >>
> >> v5->v6:
> >> - fix compile error(implicit declaration of function 'lru_gen_distance')
> >> when CONFIG_LRU_GEN is disable.
> >
> > Did you build-test it this time? I don't think LRU_REFS_FLAGS is
> > defined when CONFIG_LRU_GEN=y.
> >
>
> Yes, I tested. I didn't test when CONFIG_LRU_GEN=n in patch v5.
> I tested with CONFIG_LRU_GEN=n and CONFIG_LRU_GEN=y in patch v6. I am
> using the next.

I see. I would base my patches on mm-unstable instead -next, i.e.,
git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable