Re: interaction of MADV_PAGEOUT with CoW anonymous mappings?
From: Minchan Kim
Date: Thu Mar 12 2020 - 22:00:23 EST
On Thu, Mar 12, 2020 at 02:41:07PM -0700, Dave Hansen wrote:
> One other fun thing. I have a "victim" thread sitting in a loop doing:
>
> sleep(1)
> memcpy(&garbage, buffer, sz);
>
> The "attacker" is doing
>
> madvise(buffer, sz, MADV_PAGEOUT);
>
> in a loop. That, oddly enough doesn't cause the victim to page fault.
> But, if I do:
>
> memcpy(&garbage, buffer, sz);
> madvise(buffer, sz, MADV_PAGEOUT);
>
> It *does* cause the memory to get paged out. The MADV_PAGEOUT code
> actually has a !pte_present() check. It will punt on a PTE if it sees
> it. In other words, if a page is in the swap cache but not mapped by a
> pte_present() PTE, MADV_PAGEOUT won't touch it.
>
> Shouldn't MADV_PAGEOUT be able to find and reclaim those pages? Patch
> attached.
>
>
> ---
>
> b/mm/madvise.c | 38 +++++++++++++++++++++++++++++++-------
> 1 file changed, 31 insertions(+), 7 deletions(-)
>
> diff -puN mm/madvise.c~madv-pageout-find-swap-cache mm/madvise.c
> --- a/mm/madvise.c~madv-pageout-find-swap-cache 2020-03-12 14:24:45.178775035 -0700
> +++ b/mm/madvise.c 2020-03-12 14:35:49.706773378 -0700
> @@ -248,6 +248,36 @@ static void force_shm_swapin_readahead(s
> #endif /* CONFIG_SWAP */
>
> /*
> + * Given a PTE, find the corresponding 'struct page'. Also handles
> + * non-present swap PTEs.
> + */
> +struct page *pte_to_reclaim_page(struct vm_area_struct *vma,
> + unsigned long addr, pte_t ptent)
> +{
> + swp_entry_t entry;
> +
> + /* Totally empty PTE: */
> + if (pte_none(ptent))
> + return NULL;
> +
> + /* A normal, present page is mapped: */
> + if (pte_present(ptent))
> + return vm_normal_page(vma, addr, ptent);
> +
Please check is_swap_pte first.
> + entry = pte_to_swp_entry(vmf->orig_pte);
> + /* Is it one of the "swap PTEs" that's not really swap? */
> + if (non_swap_entry(entry))
> + return false;
> +
> + /*
> + * The PTE was a true swap entry. The page may be in the
> + * swap cache. If so, find it and return it so it may be
> + * reclaimed.
> + */
> + return lookup_swap_cache(entry, vma, addr);
If we go with handling only exclusived owned page for anon,
I think we should apply the rule to swap cache, too.
Do you mind posting it as formal patch?
Thanks for the explain about vulnerability and the patch, Dave!
> +}
> +
> +/*
> * Schedule all required I/O operations. Do not wait for completion.
> */
> static long madvise_willneed(struct vm_area_struct *vma,
> @@ -389,13 +419,7 @@ regular_page:
> for (; addr < end; pte++, addr += PAGE_SIZE) {
> ptent = *pte;
>
> - if (pte_none(ptent))
> - continue;
> -
> - if (!pte_present(ptent))
> - continue;
> -
> - page = vm_normal_page(vma, addr, ptent);
> + page = pte_to_reclaim_page(vma, addr, ptent);
> if (!page)
> continue;
>
> _