Re: [PATCH v6 10/16] mm: replace vm_lock and detached flag with a reference count

From: Suren Baghdasaryan
Date: Sun Dec 22 2024 - 22:04:12 EST


On Thu, Dec 19, 2024 at 10:55 AM Liam R. Howlett
<Liam.Howlett@xxxxxxxxxx> wrote:
>
> * Peter Zijlstra <peterz@xxxxxxxxxxxxx> [241219 13:47]:
> > On Thu, Dec 19, 2024 at 01:18:23PM -0500, Liam R. Howlett wrote:
> >
> > > > For RCU lookups only the mas tree matters -- and its left present there.
> > > >
> > > > If you really want to block RCU readers, I would suggest punching a hole
> > > > in the mm_mt. All the traditional code won't notice anyway, this is all
> > > > with mmap_lock held for writing.
> > >
> > > We don't want to block all rcu readers, we want to block the rcu readers
> > > that would see the problem - that is, anyone trying to read a particular
> > > area.
> > >
> > > Right now we can page fault in unpopulated vmas while writing other vmas
> > > to the tree. We are also moving more users to rcu reading to use the
> > > vmas they need without waiting on writes to finish.
> > >
> > > Maybe I don't understand your suggestion, but I would think punching a
> > > hole would lose this advantage?
> >
> > My suggestion was to remove the range stuck in mas_detach from mm_mt.
> > That is exactly the affected range, no?
>
> Yes.
>
> But then looping over the vmas will show a gap where there should not be
> a gap.
>
> If we stop rcu readers entirely we lose the advantage.
>
> This is exactly the issue that the locking dance was working around :)

IOW we write-lock the entire range before removing any part of it for
the whole transaction to be atomic, correct?


Peter, you suggested the following pattern for ensuring vma is
detached with no possible readers:

vma_iter_store()
vma_start_write()
vma_mark_detached()

What do you think about this alternative?

vma_start_write()
...
vma_iter_store()
vma_mark_detached()
vma_assert_write_locked(vma)
if (unlikely(!refcount_dec_and_test(&vma->vm_refcnt)))
vma_start_write()

The second vma_start_write() is unlikely to be executed because the
vma is locked, vm_refcnt might be increased only temporarily by
readers before they realize the vma is locked and that's a very narrow
window. I think performance should not visibly suffer?
OTOH this would let us keep current locking patterns and would
guarantee that vma_mark_detached() always exits with a detached and
unused vma (less possibilities for someone not following an exact
pattern and ending up with a detached but still used vma).

>