On Wed, Nov 30, 2022 at 11:07 AM Waiman Long <longman@xxxxxxxxxx> wrote:
On 11/29/22 21:47, Tonghao Zhang wrote:Yes, thanks Waiman, I should add the busy_out label.
On Wed, Nov 30, 2022 at 9:50 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:That is not right. You have to do the same step as above by decrementing
Hi Hao,Yes, I tested lockdep too, we can't use the lock in NMI(but only
On 11/30/2022 3:36 AM, Hao Luo wrote:
On Tue, Nov 29, 2022 at 9:32 AM Boqun Feng <boqun.feng@xxxxxxxxx> wrote:The warning from lockdep is due to the reason the bucket lock A is used in a
Just to be clear, I meant to refactor htab_lock_bucket() into a tryThe potential deadlock happens when the lock is contended from the
lock pattern. Also after a second thought, the below suggestion doesn't
work. I think the proper way is to make htab_lock_bucket() as a
raw_spin_trylock_irqsave().
Regards,
Boqun
same cpu. When the lock is contended from a remote cpu, we would like
the remote cpu to spin and wait, instead of giving up immediately. As
this gives better throughput. So replacing the current
raw_spin_lock_irqsave() with trylock sacrifices this performance gain.
I suspect the source of the problem is the 'hash' that we used in
htab_lock_bucket(). The 'hash' is derived from the 'key', I wonder
whether we should use a hash derived from 'bucket' rather than from
'key'. For example, from the memory address of the 'bucket'. Because,
different keys may fall into the same bucket, but yield different
hashes. If the same bucket can never have two different 'hashes' here,
the map_locked check should behave as intended. Also because
->map_locked is per-cpu, execution flows from two different cpus can
both pass.
no-NMI context firstly, then the same bucke lock is used a NMI context, so
try_lock work fine) context if we use them no-NMI context. otherwise
the lockdep prints the warning.
* for the dead-lock case: we can use the
1. hash & min(HASHTAB_MAP_LOCK_MASK, htab->n_buckets -1)
2. or hash bucket address.
* for lockdep warning, we should use in_nmi check with map_locked.
BTW, the patch doesn't work, so we can remove the lock_key
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c50eb518e262fa06bd334e6eec172eaf5d7a5bd9
static inline int htab_lock_bucket(const struct bpf_htab *htab,
struct bucket *b, u32 hash,
unsigned long *pflags)
{
unsigned long flags;
hash = hash & min(HASHTAB_MAP_LOCK_MASK, htab->n_buckets -1);
preempt_disable();
if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) {
__this_cpu_dec(*(htab->map_locked[hash]));
preempt_enable();
return -EBUSY;
}
if (in_nmi()) {
if (!raw_spin_trylock_irqsave(&b->raw_lock, flags))
return -EBUSY;
the percpu count and enable preemption. So you may want to put all these
busy_out steps after the return 0 and use "goto busy_out;" to jump there.
there are some case, for example, we run the bpf_prog A B in task} else {BTW, with that change, I believe you can actually remove all the percpu
raw_spin_lock_irqsave(&b->raw_lock, flags);
}
*pflags = flags;
return 0;
}
map_locked count code.
context on the same cpu.
bpf_prog A
update map X
htab_lock_bucket
raw_spin_lock_irqsave()
lookup_elem_raw()
// bpf prog B is attached on lookup_elem_raw()
bpf prog B
update map X again and update the element
htab_lock_bucket()
// dead-lock
raw_spinlock_irqsave()