Re: [BUG?] X86 arch_tlbbatch_flush() seems to be lacking mm_tlb_flush_nested() integration

From: Linus Torvalds
Date: Fri Oct 14 2022 - 15:09:14 EST


On Fri, Oct 14, 2022 at 11:20 AM Jann Horn <jannh@xxxxxxxxxx> wrote:\
>
> The effect would be that after process B removes a mapping with
> munmap() and creates a new mapping in its place, it would still see
> data from the old mapping when trying to access the new mapping.
>
> Am I missing something that protects against this scenario?

While I don't think that scenario is something you can trigger in
practice without enormously bad luck, I don't see anything that would
protect against it.

Afaik, the whole vmscan thing never takes the vm lock, only the file
lock (to protect mapping->i_map) or the anonvma lock (to protect
anon_vma->rb_root).

And none of that ends up serializing with a new mmap() that doesn't
even install a new page in the page tables (and just gets an old TLB
entry). There are zero shared data structures outside of the mm
itself.

Now, munmap() *could* serialize with it, because at least munmap has
access to the data structures and their locks. But it doesn't do any
deferred flushes that I can see, so while it's serialized, it doesn't
help.

And it wouldn't help to do try_to_unmap_flush() from munmap either,
since the deferred flushing is per-thread, and the munmap is done from
a different thread.

So if you're missing something, then I am too.

All this flushing is very careful to flush before actually releasing
the page, which is our really traditional TLB flush bug. But yeah,
that's not the only race - we should flush before replacing the
mapping too.

Mel? I think the batched flushing goes back to you many many years
ago. I hope Jann and me are just being stupid and missing something
obvious.

Linus