Re: [PATCH RESEND bpf-next v10 2/8] bpf: clear list node owner and unlink before drop
From: Alexei Starovoitov
Date: Wed May 13 2026 - 21:50:36 EST
On Wed May 13, 2026 at 3:53 PM PDT, Eduard Zingerman wrote:
> On Tue, 2026-05-12 at 06:41 +0000, bot+bpf-ci@xxxxxxxxxx wrote:
>
> [...]
>
>> When a BPF program holds an owning or refcount-acquired reference to
>> one of these nodes (node X), which is structurally supported because
>> __bpf_obj_drop_impl() uses refcount_dec_and_test() and only frees at
>> refcount 0, a concurrent push to a DIFFERENT bpf_list_head becomes a
>> corruption:
>>
>> CPU 0 (bpf_list_head_free, lock released) CPU 1 (BPF prog, refcount X)
>> ----------------------------------------- ----------------------------
>> (owner of X == NULL, X linked in drain)
>> bpf_list_push_back(other, X)
>> __bpf_list_add: spin_lock()
>> cmpxchg(X->owner, NULL,
>> POISON) -> OK
>> list_add_tail(&X->list_head,
>> other_head)
>> -> overwrites X->next,
>> X->prev, corrupts
>> other_head's chain
>> because X is still
>> stitched into drain
>> pos = drain.next; (may be X or neighbor using X's stale next)
>> list_del_init(pos); reads X->next/prev now pointing into other_head,
>> corrupts other_head's list and/or drain
>
>
> Kaitao, this scenario seem plausible, could you please comment on it?
I think bot is correct.
This patch looks buggy.
It seems to me an optimization that breaks the concurrent logic.
May be just drop this patch and reorder the other one, so that bot
sees nonown suffix logic first.