Re: [PATCH 12/41] mm: add per-VMA lock and helper functions to control it

From: Michal Hocko
Date: Tue Jan 17 2023 - 10:12:52 EST


On Tue 17-01-23 16:04:26, Michal Hocko wrote:
> On Mon 09-01-23 12:53:07, Suren Baghdasaryan wrote:
> > Introduce a per-VMA rw_semaphore to be used during page fault handling
> > instead of mmap_lock. Because there are cases when multiple VMAs need
> > to be exclusively locked during VMA tree modifications, instead of the
> > usual lock/unlock patter we mark a VMA as locked by taking per-VMA lock
> > exclusively and setting vma->lock_seq to the current mm->lock_seq. When
> > mmap_write_lock holder is done with all modifications and drops mmap_lock,
> > it will increment mm->lock_seq, effectively unlocking all VMAs marked as
> > locked.
>
> I have to say I was struggling a bit with the above and only understood
> what you mean by reading the patch several times. I would phrase it like
> this (feel free to use if you consider this to be an improvement).
>
> Introduce a per-VMA rw_semaphore. The lock implementation relies on a
> per-vma and per-mm sequence counters to note exclusive locking:
> - read lock - (implemented by vma_read_trylock) requires the the
> vma (vm_lock_seq) and mm (mm_lock_seq) sequence counters to
> differ. If they match then there must be a vma exclusive lock
> held somewhere.
> - read unlock - (implemented by vma_read_unlock) is a trivial
> vma->lock unlock.
> - write lock - (vma_write_lock) requires the mmap_lock to be
> held exclusively and the current mm counter is noted to the vma
> side. This will allow multiple vmas to be locked under a single
> mmap_lock write lock (e.g. during vma merging). The vma counter
> is modified under exclusive vma lock.

Didn't realize one more thing.
Unlike standard write lock this implementation allows to be
called multiple times under a single mmap_lock. In a sense
it is more of mark_vma_potentially_modified than a lock.

> - write unlock - (vma_write_unlock_mm) is a batch release of all
> vma locks held. It doesn't pair with a specific
> vma_write_lock! It is done before exclusive mmap_lock is
> released by incrementing mm sequence counter (mm_lock_seq).
> - write downgrade - if the mmap_lock is downgraded to the read
> lock all vma write locks are released as well (effectivelly
> same as write unlock).
--
Michal Hocko
SUSE Labs