Re: [PATCH 12/41] mm: add per-VMA lock and helper functions to control it

From: Michal Hocko
Date: Wed Jan 18 2023 - 08:07:12 EST


On Tue 17-01-23 19:02:55, Jann Horn wrote:
> +locking maintainers
>
> On Mon, Jan 9, 2023 at 9:54 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
> > Introduce a per-VMA rw_semaphore to be used during page fault handling
> > instead of mmap_lock. Because there are cases when multiple VMAs need
> > to be exclusively locked during VMA tree modifications, instead of the
> > usual lock/unlock patter we mark a VMA as locked by taking per-VMA lock
> > exclusively and setting vma->lock_seq to the current mm->lock_seq. When
> > mmap_write_lock holder is done with all modifications and drops mmap_lock,
> > it will increment mm->lock_seq, effectively unlocking all VMAs marked as
> > locked.
> [...]
> > +static inline void vma_read_unlock(struct vm_area_struct *vma)
> > +{
> > + up_read(&vma->lock);
> > +}
>
> One thing that might be gnarly here is that I think you might not be
> allowed to use up_read() to fully release ownership of an object -
> from what I remember, I think that up_read() (unlike something like
> spin_unlock()) can access the lock object after it's already been
> acquired by someone else.

Yes, I think you are right. From a look into the code it seems that
the UAF is quite unlikely as there is a ton of work to be done between
vma_write_lock used to prepare vma for removal and actual removal.
That doesn't make it less of a problem though.

> So if you want to protect against concurrent
> deletion, this might have to be something like:
>
> rcu_read_lock(); /* keeps vma alive */
> up_read(&vma->lock);
> rcu_read_unlock();
>
> But I'm not entirely sure about that, the locking folks might know better.

I am not a locking expert but to me it looks like this should work
because the final cleanup would have to happen rcu_read_unlock.

Thanks, I have completely missed this aspect of the locking when looking
into the code.

Btw. looking at this again I have fully realized how hard it is actually
to see that vm_area_free is guaranteed to sync up with ongoing readers.
vma manipulation functions like __adjust_vma make my head spin. Would it
make more sense to have a rcu style synchronization point in
vm_area_free directly before call_rcu? This would add an overhead of
uncontended down_write of course.
--
Michal Hocko
SUSE Labs