Re: [net-next] bpf: avoid hashtab deadlock with try_lock
From: Tonghao Zhang
Date: Wed Nov 30 2022 - 00:57:22 EST
On Wed, Nov 30, 2022 at 12:13 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
>
> Hi,
>
> On 11/30/2022 10:47 AM, Tonghao Zhang wrote:
> > On Wed, Nov 30, 2022 at 9:50 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
> >> Hi Hao,
> >>
> >> On 11/30/2022 3:36 AM, Hao Luo wrote:
> >>> On Tue, Nov 29, 2022 at 9:32 AM Boqun Feng <boqun.feng@xxxxxxxxx> wrote:
> >>>> Just to be clear, I meant to refactor htab_lock_bucket() into a try
> >>>> lock pattern. Also after a second thought, the below suggestion doesn't
> >>>> work. I think the proper way is to make htab_lock_bucket() as a
> >>>> raw_spin_trylock_irqsave().
> >>>>
> >>>> Regards,
> >>>> Boqun
> >>>>
> >>> The potential deadlock happens when the lock is contended from the
> >>> same cpu. When the lock is contended from a remote cpu, we would like
> >>> the remote cpu to spin and wait, instead of giving up immediately. As
> >>> this gives better throughput. So replacing the current
> >>> raw_spin_lock_irqsave() with trylock sacrifices this performance gain.
> >>>
> >>> I suspect the source of the problem is the 'hash' that we used in
> >>> htab_lock_bucket(). The 'hash' is derived from the 'key', I wonder
> >>> whether we should use a hash derived from 'bucket' rather than from
> >>> 'key'. For example, from the memory address of the 'bucket'. Because,
> >>> different keys may fall into the same bucket, but yield different
> >>> hashes. If the same bucket can never have two different 'hashes' here,
> >>> the map_locked check should behave as intended. Also because
> >>> ->map_locked is per-cpu, execution flows from two different cpus can
> >>> both pass.
> >> The warning from lockdep is due to the reason the bucket lock A is used in a
> >> no-NMI context firstly, then the same bucke lock is used a NMI context, so
> > Yes, I tested lockdep too, we can't use the lock in NMI(but only
> > try_lock work fine) context if we use them no-NMI context. otherwise
> > the lockdep prints the warning.
> > * for the dead-lock case: we can use the
> > 1. hash & min(HASHTAB_MAP_LOCK_MASK, htab->n_buckets -1)
> > 2. or hash bucket address.
> Use the computed hash will be better than hash bucket address, because the hash
> buckets are allocated sequentially.
> >
> > * for lockdep warning, we should use in_nmi check with map_locked.
> >
> > BTW, the patch doesn't work, so we can remove the lock_key
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c50eb518e262fa06bd334e6eec172eaf5d7a5bd9
> >
> > static inline int htab_lock_bucket(const struct bpf_htab *htab,
> > struct bucket *b, u32 hash,
> > unsigned long *pflags)
> > {
> > unsigned long flags;
> >
> > hash = hash & min(HASHTAB_MAP_LOCK_MASK, htab->n_buckets -1);
> >
> > preempt_disable();
> > if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) {
> > __this_cpu_dec(*(htab->map_locked[hash]));
> > preempt_enable();
> > return -EBUSY;
> > }
> >
> > if (in_nmi()) {
> > if (!raw_spin_trylock_irqsave(&b->raw_lock, flags))
> > return -EBUSY;
> The only purpose of trylock here is to make lockdep happy and it may lead to
> unnecessary -EBUSY error for htab operations in NMI context. I still prefer add
> a virtual lock-class for map_locked to fix the lockdep warning. So could you use
Hi, what is virtual lock-class ? Can you give me an example of what you mean?
> separated patches to fix the potential dead-lock and the lockdep warning ? It
> will be better you can also add a bpf selftests for deadlock problem as said before.
>
> Thanks,
> Tao
> > } else {
> > raw_spin_lock_irqsave(&b->raw_lock, flags);
> > }
> >
> > *pflags = flags;
> > return 0;
> > }
> >
> >
> >> lockdep deduces that may be a dead-lock. I have already tried to use the same
> >> map_locked for keys with the same bucket, the dead-lock is gone, but still got
> >> lockdep warning.
> >>> Hao
> >>> .
> >
>
--
Best regards, Tonghao