Re: [PATCH v6 2/6] mm/vmscan: protect the workingset on anonymous LRU

From: Joonsoo Kim
Date: Mon Jul 20 2020 - 02:53:39 EST


2020ë 7ì 17ì (ê) ìí 10:59, Johannes Weiner <hannes@xxxxxxxxxxx>ëì ìì:
>
> On Wed, Jun 17, 2020 at 02:26:19PM +0900, js1304@xxxxxxxxx wrote:
> > From: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx>
> >
> > In current implementation, newly created or swap-in anonymous page
> > is started on active list. Growing active list results in rebalancing
> > active/inactive list so old pages on active list are demoted to inactive
> > list. Hence, the page on active list isn't protected at all.
> >
> > Following is an example of this situation.
> >
> > Assume that 50 hot pages on active list. Numbers denote the number of
> > pages on active/inactive list (active | inactive).
> >
> > 1. 50 hot pages on active list
> > 50(h) | 0
> >
> > 2. workload: 50 newly created (used-once) pages
> > 50(uo) | 50(h)
> >
> > 3. workload: another 50 newly created (used-once) pages
> > 50(uo) | 50(uo), swap-out 50(h)
> >
> > This patch tries to fix this issue.
> > Like as file LRU, newly created or swap-in anonymous pages will be
> > inserted to the inactive list. They are promoted to active list if
> > enough reference happens. This simple modification changes the above
> > example as following.
> >
> > 1. 50 hot pages on active list
> > 50(h) | 0
> >
> > 2. workload: 50 newly created (used-once) pages
> > 50(h) | 50(uo)
> >
> > 3. workload: another 50 newly created (used-once) pages
> > 50(h) | 50(uo), swap-out 50(uo)
> >
> > As you can see, hot pages on active list would be protected.
> >
> > Note that, this implementation has a drawback that the page cannot
> > be promoted and will be swapped-out if re-access interval is greater than
> > the size of inactive list but less than the size of total(active+inactive).
> > To solve this potential issue, following patch will apply workingset
> > detection that is applied to file LRU some day before.
> >
> > v6: Before this patch, all anon pages (inactive + active) are considered
> > as workingset. However, with this patch, only active pages are considered
> > as workingset. So, file refault formula which uses the number of all
> > anon pages is changed to use only the number of active anon pages.
>
> I can see that also from the code, but it doesn't explain why.
>
> And I'm not sure this is correct. I can see two problems with it.
>
> After your patch series, there is still one difference between anon
> and file: cache trim mode. If the "use-once" anon dominate most of
> memory and you have a small set of heavily thrashing files, it would
> not get recognized. File refaults *have* to compare their distance to
> the *entire* anon set, or we could get trapped in cache trimming mode
> even as file pages with access frequencies <= RAM are thrashing.
>
> On the anon side, there is no cache trimming mode. But even if we're
> not in cache trimming mode and active file is already being reclaimed,
> we have to recognize thrashing on the anon side when reuse frequencies
> are within available RAM. Otherwise we treat an inactive file that is
> not being reused as having the same value as an anon page that is
> being reused. And then we may reclaim file and anon at the same rate
> even as anon is thrashing and file is not. That's not right.
>
> We need to activate everything with a reuse frequency <= RAM. Reuse
> frequency is refault distance plus size of the inactive list the page
> was on. This means anon distances should be compared to active anon +
> inactive file + active file, and file distances should be compared to
> active file + inactive_anon + active anon.

You're right. Maybe, I'm confused about something at that time. I will change
it as you suggested.

Thanks.