Re: [PATCH 41/41] mm: replace rw_semaphore with atomic_t in vma_lock

From: Suren Baghdasaryan
Date: Mon Jan 16 2023 - 23:53:09 EST


On Mon, Jan 16, 2023 at 7:16 PM Hillf Danton <hdanton@xxxxxxxx> wrote:
>
> On Mon, 16 Jan 2023 15:08:48 -0800 Suren Baghdasaryan <surenb@xxxxxxxxxx>
> > On Mon, Jan 16, 2023 at 6:07 AM Hillf Danton <hdanton@xxxxxxxx> wrote:
> > > On Mon, 9 Jan 2023 12:53:36 -0800 Suren Baghdasaryan <surenb@xxxxxxxxxx>
> > > > static inline bool vma_read_trylock(struct vm_area_struct *vma)
> > > > {
> > > > /* Check before locking. A race might cause false locked result. */
> > > > - if (vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))
> > > > + if (vma->vm_lock->lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))
> > > > return false;
> > >
> > > Add mb to pair with the above wmb like
> >
> > The wmb above is to ensure the ordering between updates of lock_seq
> > and vm_lock->count (lock_seq is updated first and vm_lock->count only
> > after that). The first access to vm_lock->count in this function is
> > atomic_inc_unless_negative() and it's an atomic RMW operation with a
> > return value. According to documentation such functions are fully
> > ordered, therefore I think we already have an implicit full memory
> > barrier between reads of lock_seq and vm_lock->count here. Am I wrong?
>
> No you are not.

I'm not wrong or the other way around? Please expand a bit.

> Revisit it in case of full mb not ensured.
>
> #ifndef arch_atomic_inc_unless_negative
> static __always_inline bool
> arch_atomic_inc_unless_negative(atomic_t *v)
> {
> int c = arch_atomic_read(v);
>
> do {
> if (unlikely(c < 0))
> return false;
> } while (!arch_atomic_try_cmpxchg(v, &c, c + 1));
>
> return true;
> }
> #define arch_atomic_inc_unless_negative arch_atomic_inc_unless_negative
> #endif

I think your point is that the full mb is not ensured here, specifically if c<0?

>
> > > int count = atomic_read(&vma->vm_lock->count);
> > > for (;;) {
> > > int new = count + 1;
> > >
> > > if (count < 0 || new < 0)
> > > return false;
> > >
> > > new = atomic_cmpxchg(&vma->vm_lock->count, count, new);
> > > if (new == count)
> > > break;
> > > count = new;
> > > cpu_relax();
> > > }
> > >
> > > (wake up waiting readers after taking the lock;
> > > but the write lock owner before this read trylock should be
> > > responsible for waking waiters up.)
> > >
> > > lock acquire for read.
> >
> > This schema might cause readers to wait, which is not an exact
>
> Could you specify a bit more on wait?

Oh, I misunderstood your intent with that for() loop. Indeed, if a
writer took the lock, the count will be negative and it terminates
with a failure. Yeah, I think that would work.

>
> > replacement for down_read_trylock(). The requirement to wake up
> > waiting readers also complicates things
>
> If the writer lock owner is preempted by a reader while releasing lock,
>
> set count to zero
> <-- preempt
> wake up waiters
>
> then lock is owned by reader but with read waiters.
>
> That is buggy if write waiter starvation is allowed in this patchset.

I don't quite understand your point here. Readers don't wait, so there
can't be "read waiters". Could you please expand with a race diagram
maybe?

>
> > and since we can always fall
> > back to mmap_lock, that complication is unnecessary IMHO. I could use
> > part of your suggestion like this:
> >
> > int new = count + 1;
> >
> > if (count < 0 || new < 0)
> > return false;
> >
> > new = atomic_cmpxchg(&vma->vm_lock->count, count, new);
> > if (new == count)
> > return false;
> >
> > Compared to doing atomic_inc_unless_negative() first, like I did
> > originally, this schema opens a bit wider window for the writer to get
> > in the middle and cause the reader to fail locking but I don't think
> > it would result in any visible regression.
>
> It is definitely fine for writer to acquire the lock while reader is
> doing trylock, no?

Yes.

>