Re: [PATCH v6 10/16] mm: replace vm_lock and detached flag with a reference count

From: Suren Baghdasaryan
Date: Mon Dec 16 2024 - 16:53:38 EST


On Mon, Dec 16, 2024 at 1:15 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Mon, Dec 16, 2024 at 11:24:13AM -0800, Suren Baghdasaryan wrote:
>
> FWIW, I find the whole VMA_STATE_{A,DE}TATCHED thing awkward. And

I'm bad with naming things, so any better suggestions are welcome.
Are you suggesting to drop VMA_STATE_{A,DE}TATCHED nomenclature and
use 0/1 values directly?

> perhaps s/VMA_STATE_LOCKED/VMA_LOCK_OFFSET/ ?

Sounds good. I'll change it to VMA_LOCK_OFFSET.

>
> Also, perhaps:
>
> #define VMA_REF_LIMIT (VMA_LOCK_OFFSET - 2)

Ack.

>
> > @@ -699,10 +700,27 @@ static inline void vma_numab_state_free(struct vm_area_struct *vma) {}
> > #ifdef CONFIG_PER_VMA_LOCK
> > static inline void vma_lock_init(struct vm_area_struct *vma)
> > {
> > - init_rwsem(&vma->vm_lock.lock);
> > +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> > + static struct lock_class_key lockdep_key;
> > +
> > + lockdep_init_map(&vma->vmlock_dep_map, "vm_lock", &lockdep_key, 0);
> > +#endif
> > + refcount_set(&vma->vm_refcnt, VMA_STATE_DETACHED);
> > vma->vm_lock_seq = UINT_MAX;
>
> Depending on how you do the actual allocation (GFP_ZERO) you might want
> to avoid that vm_refcount store entirely.

I think we could initialize it to 0 in the slab cache constructor and
when vma is freed we already ensure it's 0. So, even when reused it
will be in the correct 0 state.

>
> Perhaps instead write: VM_WARN_ON(refcount_read(&vma->vm_refcnt));

Yes, with the above approach that should work.

>
> > @@ -813,25 +849,42 @@ static inline void vma_assert_write_locked(struct vm_area_struct *vma)
> >
> > static inline void vma_assert_locked(struct vm_area_struct *vma)
> > {
> > - if (!rwsem_is_locked(&vma->vm_lock.lock))
> > + if (refcount_read(&vma->vm_refcnt) <= VMA_STATE_ATTACHED)
> if (is_vma_detached(vma))
>
> > vma_assert_write_locked(vma);
> > }
> >
> > -static inline void vma_mark_attached(struct vm_area_struct *vma)
> > +/*
> > + * WARNING: to avoid racing with vma_mark_attached(), should be called either
> > + * under mmap_write_lock or when the object has been isolated under
> > + * mmap_write_lock, ensuring no competing writers.
> > + */
> > +static inline bool is_vma_detached(struct vm_area_struct *vma)
> > {
> > - vma->detached = false;
> > + return refcount_read(&vma->vm_refcnt) == VMA_STATE_DETACHED;
> return !refcount_read(&vma->vm_refcnt);
> > }
> >
> > -static inline void vma_mark_detached(struct vm_area_struct *vma)
> > +static inline void vma_mark_attached(struct vm_area_struct *vma)
> > {
> > - /* When detaching vma should be write-locked */
> > vma_assert_write_locked(vma);
> > - vma->detached = true;
> > +
> > + if (is_vma_detached(vma))
> > + refcount_set(&vma->vm_refcnt, VMA_STATE_ATTACHED);
>
> Urgh, so it would be really good to not call this at all them not 0.
> I've not tried to untangle the mess, but this is really awkward. Surely
> you don't add it to the mas multiple times either.

The issue is that when we merge/split/shrink/grow vmas, we skip on
marking them detached while modifying them. Therefore from
vma_mark_attached() POV it will look like we are attaching an already
attached vma. I can try to clean that up if this is really a concern.

>
> Also:
>
> refcount_set(&vma->vm_refcnt, 1);
>
> is so much clearer.

Ok, IIUC you are in favour of dropping VMA_STATE_ATTACHED/VMA_STATE_DETACHED.

>
> That is, should this not live in vma_iter_store*(), right before
> mas_store_gfp() ?

Currently it's done right *after* mas_store_gfp() but I was debating
with myself if it indeed should be *before* insertion into the tree...

>
> Also, ISTR having to set vm_lock_seq right before it?

Yes, vma_mark_attached() requires vma to be write-locked beforehand,
hence the above vma_assert_write_locked(). But oftentimes it's locked
not right before vma_mark_attached() because some other modification
functions also require vma to be write-locked.

>
> > }
> >
> > -static inline bool is_vma_detached(struct vm_area_struct *vma)
> > +static inline void vma_mark_detached(struct vm_area_struct *vma)
> > {
> > - return vma->detached;
> > + vma_assert_write_locked(vma);
> > +
> > + if (is_vma_detached(vma))
> > + return;
>
> Again, this just reads like confusion :/ Surely you don't have the same
> with mas_detach?

I'll double-check if we ever double-mark vma as detached.

Thanks for the review!

>
> > +
> > + /* We are the only writer, so no need to use vma_refcount_put(). */
> > + if (!refcount_dec_and_test(&vma->vm_refcnt)) {
> > + /*
> > + * Reader must have temporarily raised vm_refcnt but it will
> > + * drop it without using the vma since vma is write-locked.
> > + */
> > + }
> > }