Re: [PATCH v6 10/16] mm: replace vm_lock and detached flag with a reference count

From: Peter Zijlstra
Date: Thu Dec 19 2024 - 12:29:20 EST


On Thu, Dec 19, 2024 at 08:14:24AM -0800, Suren Baghdasaryan wrote:
> On Thu, Dec 19, 2024 at 1:13 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > On Wed, Dec 18, 2024 at 01:53:17PM -0800, Suren Baghdasaryan wrote:
> >
> > > Ah, ok I see now. I completely misunderstood what for_each_vma_range()
> > > was doing.
> > >
> > > Then I think vma_start_write() should remain inside
> > > vms_gather_munmap_vmas() and all vmas in mas_detach should be
> >
> > No, it must not. You really are not modifying anything yet (except the
> > split, which we've already noted mark write themselves).
> >
> > > write-locked, even the ones we are not modifying. Otherwise what would
> > > prevent the race I mentioned before?
> > >
> > > __mmap_region
> > > __mmap_prepare
> > > vms_gather_munmap_vmas // adds vmas to be unmapped into mas_detach,
> > > // some locked
> > > by __split_vma(), some not locked
> > >
> > > lock_vma_under_rcu()
> > > vma = mas_walk // finds
> > > unlocked vma also in mas_detach
> > > vma_start_read(vma) //
> > > succeeds since vma is not locked
> > > // vma->detached, vm_start,
> > > vm_end checks pass
> > > // vma is successfully read-locked
> > >
> > > vms_clean_up_area(mas_detach)
> > > vms_clear_ptes
> > > // steps on a cleared PTE
> >
> > So here we have the added complexity that the vma is not unhooked at
> > all. Is there anything that would prevent a concurrent gup_fast() from
> > doing the same -- touch a cleared PTE?
> >
> > AFAICT two threads, one doing overlapping mmap() and the other doing
> > gup_fast() can result in exactly this scenario.
> >
> > If we don't care about the GUP case, when I'm thinking we should not
> > care about the lockless RCU case either.
> >
> > > __mmap_new_vma
> > > vma_set_range // installs new vma in the range
> > > __mmap_complete
> > > vms_complete_munmap_vmas // vmas are write-locked and detached
> > > but it's too late
> >
> > But at this point that old vma really is unhooked, and the
> > vma_write_start() here will ensure readers are gone and it will clear
> > PTEs *again*.
>
> So, to summarize, you want vma_start_write() and vma_mark_detached()
> to be done when we are removing the vma from the tree, right?

*after*

> Something like:

vma_iter_store()
vma_start_write()
vma_mark_detached()

By having vma_start_write() after being unlinked you get the guarantee
of no concurrency. New lookups cannot find you (because of that
vma_iter_store()) and existing readers will be waited for.

> And the race I described is not a real problem since the vma is still
> in the tree, so gup_fast() does exactly that and will simply reinstall
> the ptes.

Just so.