Re: [PATCH v6 10/16] mm: replace vm_lock and detached flag with a reference count

From: Peter Zijlstra
Date: Thu Dec 19 2024 - 12:55:23 EST


On Thu, Dec 19, 2024 at 12:16:45PM -0500, Liam R. Howlett wrote:

> Well, hold on - it is taken out of the rmap/anon vma chain here. It is
> completely unhooked except the vma tree at this point. We're not adding
> complexity, we're dealing with it.

So I'm not entirely sure I understand the details here -- this is again
about being able to do rollback when things fail?

There is comment above the vms_clean_up_area() call in __mmap_prepare(),
but its not making sense atm.

> >Is there anything that would prevent a concurrent gup_fast() from
> > > doing the same -- touch a cleared PTE?
>
> Where does gup_fast() install PTEs? Doesn't it bail once a READ_ONCE()
> on any level returns no PTE?

I think you're right, GUP doesn't, but any 'normal' page-table walker
will.

> > > AFAICT two threads, one doing overlapping mmap() and the other doing
> > > gup_fast() can result in exactly this scenario.
>
> The mmap() call will race with the gup_fast(), but either the nr_pinned
> will be returned from gup_fast() before vms_clean_up_area() removes the
> page table (or any higher level), or gup_fast() will find nothing.

Agreed.

> > > If we don't care about the GUP case, when I'm thinking we should not
> > > care about the lockless RCU case either.
> >
> > Also, at this point we'll just fail to find a page, and that is nothing
> > special. The problem with accessing an unmapped VMA is that the
> > page-table walk will instantiate page-tables.
>
> I think there is a problem if we are reinstalling page tables on a vma
> that's about to be removed. I think we are avoiding this with our
> locking though?

So this is purely about the overlapping part, right? We need to remove
the old pages, install the new mapping and have new pages populate the
thing.

But either way around, the range stays valid and page-tables stay
needed.

> > Given this is an overlapping mmap -- we're going to need to those
> > page-tables anyway, so no harm done.
>
> Well, maybe? The mapping may now be an anon vma vs a file backed, or
> maybe it's PROT_NONE?

The page-tables don't care about all that no? The only thing where it
matters is for things like THP, because that affects the level of
page-tables, but otherwise it's all page-table content (ptes).

> > Only after the VMA is unlinked must we ensure we don't accidentally
> > re-instantiate page-tables.
>
> It's not as simple as that, unfortunately. There are vma callbacks for
> drivers (or hugetlbfs, or whatever) that do other things. So we need to
> clean up the area before we are able to replace the vma and part of that
> clean up is the page tables, or anon vma chain, and/or closing a file.
>
> There are other ways of finding the vma as well, besides the vma tree.
> We are following the locking so that we are safe from those perspectives
> as well, and so the vma has to be unlinked in a few places in a certain
> order.

For RCU lookups only the mas tree matters -- and its left present there.

If you really want to block RCU readers, I would suggest punching a hole
in the mm_mt. All the traditional code won't notice anyway, this is all
with mmap_lock held for writing.