Re: interaction of MADV_PAGEOUT with CoW anonymous mappings?

From: Dave Hansen
Date: Thu Mar 12 2020 - 16:26:25 EST


On 3/12/20 1:16 PM, Minchan Kim wrote:
> On Thu, Mar 12, 2020 at 09:22:48AM +0100, Michal Hocko wrote:
> I't likde to wait Jann's reply since Dave gave his opinion about the vulnerability.
> https://lore.kernel.org/linux-mm/cf95db88-968d-fee5-1c15-10d024c09d8a@xxxxxxxxx/
> Jann, could you give your insigh about that practically it's possible?

FWIW, just checking for mapcount>=1 seems like a pretty sane fix to me.
I went looking at doing it another way, but Michal was quite correct.
We'd probably end up having to special-case something underneath
shrink_page_list().

> A real dumb question to understand vulnerability:
>
> The attacker would be able to trigger heavy memory consumption so that he
> could make paging them out without MADV_PAGEOUT. I know MADV_PAGEOUT makes
> it easier but he still could do without MADV_PAGEOUT.
> What makes difference here?

Causing memory pressure is quite a bit more disruptive than
MADV_PAGEOUT. It's a much more blunt instrument and is likely to result
in a lot of collateral damage and a lot of I/O.

MADV_PAGEOUT is *surgical*. You can target one very specific page if,
for instance, you think that your victim is reading it in a way that is
vulnerable. You can also do it with zero I/O (after the initial pageout).

> To clarify how MADV_PAGEWORK works:
> If other process has accessed the page so that his page table has access
> bit marked, MADV_PAGEOUT couldn't page it out.

The attacker doesn't need to get the victim to get a major fault, it
just needs to induce *a* fault. I actually did an experiment to see how
this would work in practice.

1. Allocate some memory(), touch it
2. fork()
3. In the parent: Loop reading the memory
4. In the child: loop running MADV_PAGEOUT

The pages stayed in the swap cache and the parent reading the memory saw
a constant stream of faults.