Re: [PATCH bpf-next v2 11/26] rqspinlock: Add deadlock detection and recovery

From: Kumar Kartikeya Dwivedi
Date: Fri Feb 07 2025 - 22:04:26 EST


On Sat, 8 Feb 2025 at 02:54, Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Thu, Feb 6, 2025 at 2:54 AM Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> wrote:
> >
> > +/*
> > + * It is possible to run into misdetection scenarios of AA deadlocks on the same
> > + * CPU, and missed ABBA deadlocks on remote CPUs when this function pops entries
> > + * out of order (due to lock A, lock B, unlock A, unlock B) pattern. The correct
> > + * logic to preserve right entries in the table would be to walk the array of
> > + * held locks and swap and clear out-of-order entries, but that's too
> > + * complicated and we don't have a compelling use case for out of order unlocking.
> > + *
> > + * Therefore, we simply don't support such cases and keep the logic simple here.
> > + */
>
> The comment looks obsolete from the old version of this patch.
> Patch 25 is now enforces the fifo order in the verifier
> and code review will do the same for use of res_spin_lock()
> in bpf internals. So pls drop the comment or reword.
>

Ok.

> > +static __always_inline void release_held_lock_entry(void)
> > +{
> > + struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
> > +
> > + if (unlikely(rqh->cnt > RES_NR_HELD))
> > + goto dec;
> > + WRITE_ONCE(rqh->locks[rqh->cnt - 1], NULL);
> > +dec:
> > + this_cpu_dec(rqspinlock_held_locks.cnt);
>
> ..
> > + * We don't have a problem if the dec and WRITE_ONCE above get reordered
> > + * with each other, we either notice an empty NULL entry on top (if dec
> > + * succeeds WRITE_ONCE), or a potentially stale entry which cannot be
> > + * observed (if dec precedes WRITE_ONCE).
> > + */
> > + smp_wmb();
>
> since smp_wmb() is needed to address ordering weakness vs try_cmpxchg_acquire()
> would it make sense to move it before this_cpu_dec() to address
> the 2nd part of the harmless race as well?

So you mean, even if the dec gets ordered with inc, the other side is
bound to notice a NULL entry and not a stale entry, and we'll ensure
the NULL write is always visible before the dec.
Sounds like it should also work, I will think over it for a bit and
probably make this change (and perhaps likewise on the unlock side).