Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

From: Linus Torvalds
Date: Mon Oct 24 2022 - 18:07:51 EST


On Mon, Oct 24, 2022 at 12:58 PM Jann Horn <jannh@xxxxxxxxxx> wrote:
>
> Unless I'm completely misunderstanding what's going on here, the whole
> "remove_table" thing only happens when you "remove a table", meaning
> you free an entire *pagetable*. Just zapping PTEs doesn't trigger that
> logic.

I do have to admit that I'd be happier if this code - and the GUP code
that also relies on "interrupts off" behavior - would just use a
sequence counter instead.

Relying on blocking IPI's is clever, but also clearly very subtle and
somewhat dangerous.

I think our GUP code is a *lot* more important than some "legacy
x86-32 has problems in case you have an incredibly unlikely race that
re-populates the page table with a different page that just happens to
be exactly the same MOD-4GB", so honestly, I don't think the
load-tearing is even worth worrying about - if you have hardware that
is good enough at virtualizing things, it's almost certainly already
64-bit, and running 32-bit virtual machines with PAE you really only
have yourself to blame.

So I can't find it in myself to care about the 32-bit tearing thing,
but this discussion makes me worried about Fast GUP.

Note that even with proper atomic

pte_t pte = ptep_get_lockless(ptep);

in gup_pte_range(), and even if the page tables are RCU-free'd, that
just means that the 'ptep' access itself is safe.

But then you have the whole "the lookup of the page pointer is not
atomic" wrt that. And right now that GUP code does rely on the "block
IPI" to make it basically valid.

I don't think it matters if GUP races with munmap or madvise() or
something like that - if you get the old page, that's still a valid
page, and the user only has himself to blame.

But if we have memory pressure that causes vmscan to push out a page,
and it gets replaced with a new page, and GUP gets the old page with
no serialization, that sounds like a possible source of data
inconsistency.

I don't know if this can happen, but the whole "interrupts disabled
doesn't actually block IPI's and synchronize with TLB flushes" really
sounds like it would affect GUP too. And be much more serious there
than on some x86-32 platform that nobody should be using anyway.

Linus