Re: [PATCH bpf v2] bpf: cpumap: fix race in bq_flush_to_queue on PREEMPT_RT

From: Sebastian Andrzej Siewior

Date: Thu Feb 12 2026 - 09:34:17 EST


On 2026-02-12 10:36:33 [+0800], Jiayuan Chen wrote:
> From: Jiayuan Chen <jiayuan.chen@xxxxxxxxxx>
>

> local_bh_disable() only calls migrate_disable() and does not disable
> preemption. spin_lock() also becomes a sleeping rt_mutex. Together,
There is no spin_lock() otherwise there would be no problem.


> Fix this by adding a local_lock_t to xdp_bulk_queue and acquiring it
> in bq_enqueue() and __cpu_map_flush(). On non-RT kernels, local_lock
> maps to preempt_disable/enable with zero additional overhead. On
> PREEMPT_RT, it provides a per-CPU sleeping lock that serializes
> access to the bq. Use local_lock_nested_bh() since these paths already
> run under local_bh_disable().

So you use local_lock_nested_bh() and not local_lock() but you mention
local_lock. The difference is that the former does not add any
preempt_disable() on !RT. At this point I am curious how much of this
was written by you and how much is auto generated.

> An alternative approach of snapshotting bq->count and bq->q[] before
> releasing the producer_lock was considered, but it requires copying
> the entire bq->q[] array on every flush, adding unnecessary overhead.

But you still have list_head which is not protected.

> To reproduce, insert an mdelay(100) between spin_unlock() and
> __list_del_clearprev() in bq_flush_to_queue(), then run reproducer
> provided by syzkaller.
>
> Panic:
> ===
> BUG: kernel NULL pointer dereference, address: 0000000000000000
> #PF: supervisor write access in kernel mode
> #PF: error_code(0x0002) - not-present page
> PGD 0 P4D 0
> Oops: Oops: 0002 [#1] SMP PTI
> CPU: 0 UID: 0 PID: 377 Comm: a.out Not tainted 6.19.0+ #21 PREEMPT_RT
> RIP: 0010:bq_flush_to_queue+0x145/0x200
> Call Trace:
> <TASK>
> __cpu_map_flush+0x2c/0x70
> xdp_do_flush+0x64/0x1b0
> xdp_test_run_batch.constprop.0+0x4d4/0x6d0
> bpf_test_run_xdp_live+0x24b/0x3e0
> bpf_prog_test_run_xdp+0x4a1/0x6e0
> __sys_bpf+0x44a/0x2760
> __x64_sys_bpf+0x1a/0x30
> x64_sys_call+0x146c/0x26e0
> do_syscall_64+0xd5/0x5a0
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
> </TASK>

This could be omitted. It is obvious once you see it. I somehow missed
this alloc_percpu instance while looking for this kind of bugs.
Another one is hiding in devmap.c. Mind to take a look? I think I
skip this entire folder…

> Fixes: 3253cb49cbad ("softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT")
> Reported-by: syzbot+2b3391f44313b3983e91@xxxxxxxxxxxxxxxxxxxxxxxxx
> Closes: https://lore.kernel.org/all/69369331.a70a0220.38f243.009d.GAE@xxxxxxxxxx/T/
> Signed-off-by: Jiayuan Chen <jiayuan.chen@xxxxxxxxxx>
> Signed-off-by: Jiayuan Chen <jiayuan.chen@xxxxxxxxx>
> ---
> v1 -> v2: https://lore.kernel.org/bpf/20260211064417.196401-1-jiayuan.chen@xxxxxxxxx/
> - Use local_lock_nested_bh()/local_unlock_nested_bh() instead of
> local_lock()/local_unlock(), since these paths already run under
> local_bh_disable(). (Sebastian Andrzej Siewior)
> - Replace "Caller must hold bq->bq_lock" comment with
> lockdep_assert_held() in bq_flush_to_queue(). (Sebastian Andrzej Siewior)
> - Fix Fixes tag to 3253cb49cbad ("softirq: Allow to drop the
> softirq-BKL lock on PREEMPT_RT") which is the actual commit that
> makes the race possible. (Sebastian Andrzej Siewior)
> ---
> kernel/bpf/cpumap.c | 17 +++++++++++++++--
> 1 file changed, 15 insertions(+), 2 deletions(-)

The changes below look good.

Sebastian