Re: [PATCH -V2] mm: move idle swap cache pages to the tail of LRU after COW

From: Huang, Ying
Date: Sun May 30 2021 - 19:45:17 EST


Johannes Weiner <hannes@xxxxxxxxxxx> writes:

> On Thu, May 27, 2021 at 04:49:53PM +0800, Huang Ying wrote:
>> With commit 09854ba94c6a ("mm: do_wp_page() simplification"), after
>> COW, the idle swap cache (neither the page nor the corresponding swap
>> entry is mapped by any process) will be left at the original position
>> in the LRU list. While it may be in the active list or the head of
>> the inactive list, so that vmscan may take more overhead or time to
>> reclaim these actually unused pages.
>>
>> To help the page reclaiming, in this patch, after COW, the idle swap
>> cache will be tried to be moved to the tail of the inactive LRU list.
>> To avoid to introduce much overhead to the hot COW code path, all
>> locks are acquired with try locking.
>>
>> To test the patch, we used pmbench memory accessing benchmark with
>> working-set larger than available memory on a 2-socket Intel server
>> with a NVMe SSD as swap device. Test results shows that the pmbench
>> score increases up to 21.8% with the decreased size of swap cache and
>> swapin throughput.
>>
>> Signed-off-by: "Huang, Ying" <ying.huang@xxxxxxxxx>
>> Suggested-by: Matthew Wilcox <willy@xxxxxxxxxxxxx>
>> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
>> Cc: Peter Xu <peterx@xxxxxxxxxx>
>> Cc: Hugh Dickins <hughd@xxxxxxxxxx>
>> Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
>> Cc: Mel Gorman <mgorman@xxxxxxx>
>> Cc: Rik van Riel <riel@xxxxxxxxxxx>
>> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
>> Cc: Michal Hocko <mhocko@xxxxxxxxxx>
>> Cc: Dave Hansen <dave.hansen@xxxxxxxxx>
>> Cc: Tim Chen <tim.c.chen@xxxxxxxxx>
>>
>> V2:
>>
>> - Move trylock_page() to try_to_free_idle_swapcache() per Rik and
>> Linus' comments.
>> - Fix PageLRU() checking.
>> - Fix THP processing.
>> - Rename the function.
>> ---
>> include/linux/memcontrol.h | 10 ++++++++++
>> include/linux/swap.h | 3 +++
>> mm/memcontrol.c | 12 ++++++++++++
>> mm/memory.c | 2 ++
>> mm/swapfile.c | 39 ++++++++++++++++++++++++++++++++++++++
>> 5 files changed, 66 insertions(+)
>
> Sorry the discussion fizzled out on the last patch.
>
> Let me try to recap this series: on your first submission you directly
> freed the old page if we copied. Linus was worried about overhead in
> the COW path that wouldn't pay off in a real workload. Before getting
> numbers, it was then suggested to move the pages to the tail of the
> LRU and leaving them to reclaim - which was also met with skepticism.
>
> V2 presented the LRU moving version with pmbench numbers that indeed
> show it pays off. However, much simpler direct freeing produces even
> better numbers in the same benchmark. We don't have numbers showing if
> the LRU shuffling would significantly fare better in other workloads.
>
> Purely looking at the code: whether we defer or free, we need to lock
> the page, take the LRU spinlock for this page, and touch the LRU
> linkage. If we free, we add the swapcache deletion and the page
> allocator, but it's most likely the percpu-cached fastpath. If we
> defer, reclaim needs to re-establish information about the page that
> we already had in the COW context, do another LRU operation, do the
> swapcache deletion and go through the allocator, but on cold caches.
>
> Personally, I'm a bit skeptical the extra code complexity and reclaim
> overhead in paging workloads will definitely pay off in intermittently
> paging loads (non-paging wouldn't have swap pages). As far as code
> goes, the point of 09854ba94c6a (+17, -42) was simplification, and
> this adds more lines back in another place. In particular it adds
> another lifetime variant to swap pages which are already somewhat
> unwieldy. OTOH, freeing is a two-liner reusing the swap unmap code:
>
> if (page_copied)
> free_swap_cache(old_page);

Yes. This looks better than my previous version, which duplicated the
code of free_swap_cache(). Thanks for pointing this out.

Best Regards,
Huang, Ying