Re: [PATCH 1/2] mm/madvise: help MADV_PAGEOUT to find swap cache pages

From: Minchan Kim
Date: Thu Mar 26 2020 - 02:24:57 EST


On Mon, Mar 23, 2020 at 04:41:49PM -0700, Dave Hansen wrote:
>
> From: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
>
> tl;dr: MADV_PAGEOUT ignores unmapped swap cache pages. Enable
> MADV_PAGEOUT to find and reclaim swap cache.
>
> The long story:
>
> Looking for another issue, I wrote a simple test which had two
> processes: a parent and a fork()'d child. The parent reads a
> memory buffer shared by the fork() and the child calls
> madvise(MADV_PAGEOUT) on the same buffer.
>
> The first call to MADV_PAGEOUT does what is expected: it pages
> the memory out and causes faults in the parent. However, after
> that, it does not cause any faults in the parent. MADV_PAGEOUT
> only works once! This was a surprise.
>
> The PTEs in the shared buffer start out pte_present()==1 in
> both parent and child. The first MADV_PAGEOUT operation replaces
> those with pte_present()==0 swap PTEs. The parent process
> quickly faults and recreates pte_present()==1. However, the
> child process (the one calling MADV_PAGEOUT) never touches the
> memory and has retained the non-present swap PTEs.
>
> This situation could also happen in the case where a single
> process had some of its data placed in the swap cache but where
> the memory has not yet been reclaimed.
>
> The MADV_PAGEOUT code has a pte_present()==0 check. It will
> essentially ignore any pte_present()==0 pages. This essentially
> makes unmapped swap cache immune from MADV_PAGEOUT, which is not
> very friendly behavior.
>
> Enable MADV_PAGEOUT to find and reclaim swap cache. Because
> swap cache is not pinned by holding the PTE lock, a reference
> must be held until the page is isolated, where a second
> reference is obtained.
>
> Signed-off-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
Acked-by: Minchan Kim <minchan@xxxxxxxxxx>