Re: [PATCH 1/3] sched_ext: fix NULL deref in bpf_scx_unreg() due to ops->priv race
From: zhidao su
Date: Thu Mar 26 2026 - 01:13:16 EST
On Thu, Mar 26, 2026 at 10:55:08AM +0800, Tejun Heo wrote:
> On Thu, Mar 26, 2026 at 10:28:25AM +0800, zhidao su wrote:
> > The reload_loop selftest triggers a KASAN null-ptr-deref at
> > scx_claim_exit+0x83 when two threads concurrently attach and
> > destroy BPF schedulers using the same ops map.
> ...
> Can you reproduce this? How do you trigger enable on the same ops that has
> already been enabled?
I investigated further and the analysis in patch 1/3 was wrong.
Please do not merge patches 1/3 and 2/3 from this series. Patch
3/3 is still valid and can be applied independently.
Patch 1/3 (NULL deref):
The race described in the commit message cannot occur. Both
bpf_struct_ops_link_create() and bpf_struct_ops_map_link_detach()
hold update_mutex when calling reg()/unreg(), so concurrent reg
and unreg on the same ops map are serialized. I ran 20 rounds of
reload_loop under KASAN with no crashes. The bug was never real.
I wrote the patch based on code analysis alone without first
obtaining KASAN output confirming the crash. That was wrong.
Patch 2/3 (dsq_reenq reliability):
I cannot reproduce a failure with the original test (NUM_WORKERS=4).
Running 10 iterations locally, all pass. The "reliability fix" had
no verified failure to fix and should not be merged.
Patch 3/3 (consume_immed reliability):
The original test fails consistently because ops.dispatch() moves
only one task per call, so dsq->nr never exceeds 1 and the IMMED
slow path in dsq_inc_nr() is never triggered. This is a real bug
in the test. Patch 3/3 is valid and can be considered independently.
Sorry for the noise.
zhidao