Re: [PATCH -next v2] locking/osq_lock: annotate a data race in osq_lock
From: Paul E. McKenney
Date: Mon May 11 2020 - 12:43:22 EST
On Mon, May 11, 2020 at 04:58:13PM +0100, Will Deacon wrote:
> On Sat, May 09, 2020 at 02:36:54PM -0700, Paul E. McKenney wrote:
> > On Sat, May 09, 2020 at 12:53:38PM -0400, Qian Cai wrote:
> > >
> > >
> > > > On May 9, 2020, at 12:12 PM, Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> > > >
> > > > Ah, and I forgot to ask. Why "if (data_race(prev->next == node)" instead
> > > > of "if (data_race(prev->next) == node"?
> > >
> > > I think the one you suggested is slightly better to point out the exact race. Do you want me to resend or you could squash it instead?
> >
> > The patch was still at the top of my stack, so I just amended it. Here
> > is the updated version.
> >
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > commit 13e69ca01ce1621ce74248bda86cfad47fa5a0fa
> > Author: Qian Cai <cai@xxxxxx>
> > Date: Tue Feb 11 08:54:15 2020 -0500
> >
> > locking/osq_lock: Annotate a data race in osq_lock
> >
> > The prev->next pointer can be accessed concurrently as noticed by KCSAN:
> >
> > write (marked) to 0xffff9d3370dbbe40 of 8 bytes by task 3294 on cpu 107:
> > osq_lock+0x25f/0x350
> > osq_wait_next at kernel/locking/osq_lock.c:79
> > (inlined by) osq_lock at kernel/locking/osq_lock.c:185
> > rwsem_optimistic_spin
> > <snip>
> >
> > read to 0xffff9d3370dbbe40 of 8 bytes by task 3398 on cpu 100:
> > osq_lock+0x196/0x350
> > osq_lock at kernel/locking/osq_lock.c:157
> > rwsem_optimistic_spin
> > <snip>
> >
> > Since the write only stores NULL to prev->next and the read tests if
> > prev->next equals to this_cpu_ptr(&osq_node). Even if the value is
> > shattered, the code is still working correctly. Thus, mark it as an
> > intentional data race using the data_race() macro.
> >
> > Signed-off-by: Qian Cai <cai@xxxxxx>
> > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
> >
> > diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> > index 1f77349..1de006e 100644
> > --- a/kernel/locking/osq_lock.c
> > +++ b/kernel/locking/osq_lock.c
> > @@ -154,7 +154,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
> > */
> >
> > for (;;) {
> > - if (prev->next == node &&
> > + /*
> > + * cpu_relax() below implies a compiler barrier which would
> > + * prevent this comparison being optimized away.
> > + */
> > + if (data_race(prev->next) == node &&
> > cmpxchg(&prev->next, node, NULL) == node)
> > break;
>
> I'm fine with the data_race() placement, but I don't find the comment
> very helpful. We assign the result of a READ_ONCE() to 'prev' in the
> loop, so I don't think that the cpu_relax() is really relevant.
Suppose that the compiler loaded a value that was not equal to "node".
In that case, the cmpxchg() won't happen, so something else must force
the compiler to do the reload in order to avoid an infinite loop, right?
Or am I missing something here?
Thanx, Paul
> The reason we don't need READ_ONCE() here is because if we race with
> the writer then either we'll go round the loop again after accidentally
> thinking prev->next != node, or we'll erroneously attempt the cmpxchg()
> because we thought they were equal and that will fail.
>
> Make sense?
>
> Will