Re: [PATCH] mm, swap: Fix swapoff with KSM pages
From: Andrew Morton
Date: Thu Dec 27 2018 - 21:56:05 EST
On Wed, 26 Dec 2018 13:15:22 +0800 Huang Ying <ying.huang@xxxxxxxxx> wrote:
> KSM pages may be mapped to the multiple VMAs that cannot be reached
> from one anon_vma. So during swapin, a new copy of the page need to
> be generated if a different anon_vma is needed, please refer to
> comments of ksm_might_need_to_copy() for details.
>
> During swapoff, unuse_vma() uses anon_vma (if available) to locate VMA
> and virtual address mapped to the page, so not all mappings to a
> swapped out KSM page could be found. So in try_to_unuse(), even if
> the swap count of a swap entry isn't zero, the page needs to be
> deleted from swap cache, so that, in the next round a new page could
> be allocated and swapin for the other mappings of the swapped out KSM
> page.
>
> But this contradicts with the THP swap support. Where the THP could
> be deleted from swap cache only after the swap count of every swap
> entry in the huge swap cluster backing the THP has reach 0. So
> try_to_unuse() is changed in commit e07098294adf ("mm, THP, swap:
> support to reclaim swap space for THP swapped out") to check that
> before delete a page from swap cache, but this has broken KSM swapoff
> too.
>
> Fortunately, KSM is for the normal pages only, so the original
> behavior for KSM pages could be restored easily via checking
> PageTransCompound(). That is how this patch works.
>
> ...
>
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -2197,7 +2197,8 @@ int try_to_unuse(unsigned int type, bool frontswap,
> */
> if (PageSwapCache(page) &&
> likely(page_private(page) == entry.val) &&
> - !page_swapped(page))
> + (!PageTransCompound(page) ||
> + !swap_page_trans_huge_swapped(si, entry)))
> delete_from_swap_cache(compound_head(page));
>
The patch "mm, swap: rid swapoff of quadratic complexity" changes this
code significantly. There are a few issues with that patch so I'll
drop it for now.
Vineeth, please ensure that future versions retain the above fix,
thanks.