Re: [PATCH RFC v2 6/9] mm/khugepaged: remove reuse_swap_page() usage

From: Yang Shi
Date: Thu Jan 27 2022 - 16:23:57 EST


On Wed, Jan 26, 2022 at 2:00 AM David Hildenbrand <david@xxxxxxxxxx> wrote:
>
> reuse_swap_page() currently indicates if we can write to an anon page
> without COW. A COW is required if the page is shared by multiple
> processes (either already mapped or via swap entries) or if there is
> concurrent writeback that cannot tolerate concurrent page modifications.
>
> reuse_swap_page() doesn't check for pending references from other
> processes that already unmapped the page, however,
> is_refcount_suitable() essentially does the same thing in the context of
> khugepaged. khugepaged is the last remaining user of reuse_swap_page() and
> we want to remove that function.
>
> In the context of khugepaged, we are not actually going to write to the
> page and we don't really care about other processes mapping the page:
> for example, without swap, we don't care about shared pages at all.
>
> The current logic seems to be:
> * Writable: -> Not shared, but might be in the swapcache. Nobody can
> fault it in from the swapcache as there are no other swap entries.
> * Readable and not in the swapcache: Might be shared (but nobody can
> fault it in from the swapcache).
> * Readable and in the swapcache: Might be shared and someone might be
> able to fault it in from the swapcache. Make sure we're the exclusive
> owner via reuse_swap_page().
>
> Having to guess due to lack of comments and documentation, the current
> logic really only wants to make sure that a page that might be shared
> cannot be faulted in from the swapcache while khugepaged is active.
> It's hard to guess why that is that case and if it's really still required,
> but let's try keeping that logic unmodified.

I don't think it could be faulted in while khugepaged is active since
khugepaged does hold mmap_lock in write mode IIUC. So page fault is
serialized against khugepaged.

My wild guess is that collapsing shared pages was not supported before
v5.8, so we need reuse_swap_page() to tell us if the page in swap
cache is shared or not. But it is not true anymore. And khugepaged
just allocates a THP then copy the data from base pages to huge page
then replace PTEs to PMD, it doesn't change the content of the page,
so I failed to see a problem by collapsing a shared page in swap
cache. But I'm really not entirely sure, I may miss something...



>
> Instead of relying on reuse_swap_page(), let's unconditionally
> try_to_free_swap(), special casing PageKsm(). try_to_free_swap() will fail
> if there are still swap entries targeting the page or if the page is under
> writeback.
>
> After a successful try_to_free_swap() that page cannot be readded to the
> swapcache because we're keeping the page locked and removed from the LRU
> until we actually perform the copy. So once we succeeded removing a page
> from the swapcache, it cannot be re-added until we're done copying. Add a
> comment stating that.
>
> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>
> ---
> mm/khugepaged.c | 16 +++++++++++++---
> 1 file changed, 13 insertions(+), 3 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 35f14d0a00a6..bc0ff598e98f 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -683,10 +683,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> goto out;
> }
> if (!pte_write(pteval) && PageSwapCache(page) &&
> - !reuse_swap_page(page)) {
> + (PageKsm(page) || !try_to_free_swap(page))) {
> /*
> - * Page is in the swap cache and cannot be re-used.
> - * It cannot be collapsed into a THP.
> + * Possibly shared page cannot be removed from the
> + * swapache. It cannot be collapsed into a THP.
> */
> unlock_page(page);
> result = SCAN_SWAP_CACHE_PAGE;
> @@ -702,6 +702,16 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> result = SCAN_DEL_PAGE_LRU;
> goto out;
> }
> +
> + /*
> + * We're holding the page lock and removed the page from the
> + * LRU. Once done copying, we'll unlock and readd to the
> + * LRU via release_pte_page(). If the page is still in the
> + * swapcache, we're the exclusive owner. Due to the page lock
> + * the page cannot be added to the swapcache until we're done
> + * and consequently it cannot be faulted in from the swapcache
> + * into another process.
> + */
> mod_node_page_state(page_pgdat(page),
> NR_ISOLATED_ANON + page_is_file_lru(page),
> compound_nr(page));
> --
> 2.34.1
>