Re: [PATCH bpf v2] bpf: cpumap: fix race in bq_flush_to_queue on PREEMPT_RT
From: Jiayuan Chen
Date: Thu Feb 12 2026 - 20:38:12 EST
2026/2/12 22:33, "Sebastian Andrzej Siewior" <bigeasy@xxxxxxxxxxxxx mailto:bigeasy@xxxxxxxxxxxxx?to=%22Sebastian%20Andrzej%20Siewior%22%20%3Cbigeasy%40linutronix.de%3E > wrote:
Thanks for the review, Sebastian.
> There is no spin_lock() otherwise there would be no problem.
Right, will fix the description. The per-CPU bq has no lock at all,
which is the root cause...
> …
>
> >
> > Fix this by adding a local_lock_t to xdp_bulk_queue and acquiring it
> > in bq_enqueue() and __cpu_map_flush(). On non-RT kernels, local_lock
> > maps to preempt_disable/enable with zero additional overhead. On
> > PREEMPT_RT, it provides a per-CPU sleeping lock that serializes
> > access to the bq. Use local_lock_nested_bh() since these paths already
> > run under local_bh_disable().
> >
> So you use local_lock_nested_bh() and not local_lock() but you mention
> local_lock. The difference is that the former does not add any
> preempt_disable() on !RT. At this point I am curious how much of this
> was written by you and how much is auto generated.
Sorry about the inconsistencies. The commit message became messy after
several rounds of editing across versions, and I failed to update all
the descriptions to match the final code.
> >
> > An alternative approach of snapshotting bq->count and bq->q[] before
> > releasing the producer_lock was considered, but it requires copying
> > the entire bq->q[] array on every flush, adding unnecessary overhead.
> >
> But you still have list_head which is not protected.
The snapshot alternative is incomplete as described. Will drop that paragraph.
> >
> > To reproduce, insert an mdelay(100) between spin_unlock() and
> > __list_del_clearprev() in bq_flush_to_queue(), then run reproducer
> > provided by syzkaller.
> >
> > Panic:
> > ===
> > BUG: kernel NULL pointer dereference, address: 0000000000000000
> > #PF: supervisor write access in kernel mode
> > #PF: error_code(0x0002) - not-present page
> > PGD 0 P4D 0
> > Oops: Oops: 0002 [#1] SMP PTI
> > CPU: 0 UID: 0 PID: 377 Comm: a.out Not tainted 6.19.0+ #21 PREEMPT_RT
> > RIP: 0010:bq_flush_to_queue+0x145/0x200
> > Call Trace:
> > <TASK>
> > __cpu_map_flush+0x2c/0x70
> > xdp_do_flush+0x64/0x1b0
> > xdp_test_run_batch.constprop.0+0x4d4/0x6d0
> > bpf_test_run_xdp_live+0x24b/0x3e0
> > bpf_prog_test_run_xdp+0x4a1/0x6e0
> > __sys_bpf+0x44a/0x2760
> > __x64_sys_bpf+0x1a/0x30
> > x64_sys_call+0x146c/0x26e0
> > do_syscall_64+0xd5/0x5a0
> > entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > </TASK>
> >
> This could be omitted. It is obvious once you see it. I somehow missed
Will remove the panic trace.
> this alloc_percpu instance while looking for this kind of bugs.
> Another one is hiding in devmap.c. Mind to take a look? I think I
> skip this entire folder…
I see the same pattern there - xdp_dev_bulk_queue is per-CPU with no preemption
protection in bq_enqueue() and __dev_flush().
> >
> > Fixes: 3253cb49cbad ("softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT")
> > Reported-by: syzbot+2b3391f44313b3983e91@xxxxxxxxxxxxxxxxxxxxxxxxx
> > Closes: https://lore.kernel.org/all/69369331.a70a0220.38f243.009d.GAE@xxxxxxxxxx/T/
> > Signed-off-by: Jiayuan Chen <jiayuan.chen@xxxxxxxxxx>
> > Signed-off-by: Jiayuan Chen <jiayuan.chen@xxxxxxxxx>
> > ---
> > v1 -> v2: https://lore.kernel.org/bpf/20260211064417.196401-1-jiayuan.chen@xxxxxxxxx/
> > - Use local_lock_nested_bh()/local_unlock_nested_bh() instead of
> > local_lock()/local_unlock(), since these paths already run under
> > local_bh_disable(). (Sebastian Andrzej Siewior)
> > - Replace "Caller must hold bq->bq_lock" comment with
> > lockdep_assert_held() in bq_flush_to_queue(). (Sebastian Andrzej Siewior)
> > - Fix Fixes tag to 3253cb49cbad ("softirq: Allow to drop the
> > softirq-BKL lock on PREEMPT_RT") which is the actual commit that
> > makes the race possible. (Sebastian Andrzej Siewior)
> > ---
> > kernel/bpf/cpumap.c | 17 +++++++++++++++--
> > 1 file changed, 15 insertions(+), 2 deletions(-)
> >
> The changes below look good.
Thanks!
> Sebastian