Re: [PATCH bpf-next] bpf: reuseport: add cond_resched_rcu() in reuseport_array_free()

From: Alexei Starovoitov

Date: Fri Apr 10 2026 - 15:59:56 EST


On Fri, Apr 10, 2026 at 7:07 AM Zijing Yin <yzjaurora@xxxxxxxxx> wrote:
>
> reuseport_array_free() iterates over all map entries inside
> rcu_read_lock() to detach sockets from the array. When max_entries is
> very large (e.g., hundreds of millions), this loop runs for an extended
> period without yielding the CPU, triggering RCU stall warnings in the
> kworker thread that executes bpf_map_free_deferred().
>
> The observed stall occurs because the loop has no scheduling point:
>
> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> Workqueue: events_unbound bpf_map_free_deferred
> Call Trace:
> reuseport_array_free+0x1ec/0x470 kernel/bpf/reuseport_array.c:127
> bpf_map_free_deferred+0x34a/0x7e0 kernel/bpf/syscall.c:893
> process_one_work+0x952/0x1a80
> worker_thread+0x87b/0x11f0
>
> Add cond_resched_rcu() in the loop body to allow the scheduler to run
> and RCU grace periods to complete. This is safe because each iteration
> processes a single entry independently, sk->sk_callback_lock is not held
> across the yield point, and the map is fully detached from userspace so
> no concurrent insertions can occur.
>
> This follows an established pattern for long-running kernel loops that
> must run under rcu_read_lock(). The closest precedent is in another BPF
> map free function:
>
> kernel/bpf/hashtab.c:1600
> htab_free_malloced_internal_structs()
> rcu_read_lock();
> for (i = 0; i < htab->n_buckets; i++) {
> ... walk bucket ...
> cond_resched_rcu();
> }
> rcu_read_unlock();
>
> Fixes: 5dc4c4b7d4e8 ("bpf: Introduce BPF_MAP_TYPE_REUSEPORT_SOCKARRAY")
> Signed-off-by: Zijing Yin <yzjaurora@xxxxxxxxx>
> ---
> Base: bpf-next.git master branch
> (tip a0c584fc18056709c8e047a82a6045d6c209f4ce
> "bpf: Fix use-after-free in offloaded map/prog info fill"
> as of 2026-04-09).
>
> Tested with CONFIG_PREEMPT_RCU=y, CONFIG_KASAN=y (inline),
> CONFIG_SMP=n (single vCPU QEMU VM), gcc 13.3.0.
>
> To reproduce: create a BPF_MAP_TYPE_REUSEPORT_SOCKARRAY with
> max_entries >= 100M, set rcu_cpu_stall_timeout low, pin the CPU with a
> SCHED_FIFO thread so the kworker stays in rcu_read_lock() long enough
> to trip the stall timeout, then close the fd. Without the fix the
> reuseport_array_free() kworker stalls RCU reliably; with the fix,
> cond_resched_rcu() yields periodically and no stall is observed.
> Reproducer (C source): repro_reuseport.c (https://pastebin.com/YjdwqdX1)

This is not a realistic scenario that is worth fixing.

pw-bot: cr