Re: [PATCH bpf v3 0/4] bpf: Free special fields when update hash and local storage maps

From: Menglong Dong

Date: Wed Oct 29 2025 - 02:57:40 EST


On Wed, Oct 29, 2025 at 2:50 PM Leon Hwang <leon.hwang@xxxxxxxxx> wrote:
>
>
>
> On 29/10/25 04:22, Andrii Nakryiko wrote:
> > On Tue, Oct 28, 2025 at 11:10 AM <patchwork-bot+netdevbpf@xxxxxxxxxx> wrote:
[......]
> > [ 424.982379] ? bpf_lru_pop_free+0x2c6/0x1a50
> > [ 424.982382] bpf_lru_pop_free+0x2c6/0x1a50
>
> Right, this is the classic NMI vs spinlock deadlock:
>
> Process Context (CPU 0) NMI Context (CPU 0)
> ======================= ===================
>
> syscall()
> |
> +-> htab_lru_map_update_elem()
> |
> +-> bpf_lru_pop_free()
> |
> +-> spin_lock_irqsave(&lock)
> | +-------------------+
> | | LOCK ACQUIRED [Y] |
> | | IRQs DISABLED |
> | +-------------------+
> |
> +-> [Critical Section]
> | |
> | | Working with LRU...
> | |
> | | +-----------------------+
> | |<---------------------+ ! NMI FIRES! |
> | | +-----------------------+
> | | | (IRQs disabled but |
> | | | NMI ignores that!) |
> | | +-----------------------+
> | | |
> | | [INTERRUPTED] |
> | | [Context saved] |
> | | v
> | | perf_event_nmi_handler()
> | | |
> | | +-> BPF program
> | | |
> | | +-> htab_lru_map_
> | | | update_elem()
> | | |
> | | +-> bpf_lru_pop_
> | | | free()
> | | |
> | | +-> spin_lock_
> | | | irqsave(&lock)
> | | | +------------+
> | | | | TRIES TO |
> | | | | ACQUIRE |
> | | | | SAME LOCK! |
> | | | +------------+
> | | | |
> | | | v
> | | | +------------+
> | |<--------------------------------+---+ ! DEADLOCK |
> | | | +------------+
> | | | | Lock held |
> | | Still holding lock... | | by process |
> | | Waiting for NMI to finish ---+ | | context |
> | | | | | |
> | | | | | NMI waits |
> | | | | | for same |
> | | | | | lock |
> | | | | +------------+
> | | | | |
> | | | | v
> | | | | [Spin forever]
> | | | | |
> | | | +--------+
> | | | (Circular wait)
> | | |
> | | +-> SYSTEM HUNG
> | |
> | +-> [Never reached]
> |
> +-> spin_unlock_irqrestore(&lock)
> [Never reached]
>
>
> +---------------------------------------------------------------------+
> | DEADLOCK SUMMARY |
> +---------------------------------------------------------------------+
> | |
> | Process Context: Holds &loc_l->lock, waiting for NMI to finish |
> | |
> | NMI Context: Trying to acquire &loc_l->lock |
> | (same lock, same CPU) |
> | |
> | Result: Both contexts wait for each other = DEADLOCK |
> | |
> +---------------------------------------------------------------------+
>
> We can fix this by converting the raw_spinlock_t to trylock-based
> approach, similar to the fix for ringbuf in
> commit a650d38915c1 ("bpf: Convert ringbuf map to rqspinlock").

Nice shot!

>
> In bpf_common_lru_pop_free(), we could use:
>
> if (!raw_res_spin_lock_irqsave(&loc_l->lock, flags))
> return NULL;
>
> However, this deadlock is pre-existing and not introduced by this
> series. It's better to send a separate fix for this deadlock.
>
> Hi Menglong, could you follow up on the deadlock fix?

Yeah, with pleasure. As I see, this is not the only place
that can cause deadlock and rqspinlock should be used. And
I'll follow up on such deadlocks.

Thanks!
Menglong Dong

>
> Thanks,
> Leon
>