Re: [PATCH 4/5] sched_ext: Fix scx_sched_lock / rq lock ordering

From: Andrea Righi

Date: Tue Mar 10 2026 - 02:49:57 EST

On Tue, Mar 10, 2026 at 07:39:10AM +0100, Andrea Righi wrote:
> Hi Tejun,
>
> On Mon, Mar 09, 2026 at 03:16:52PM -1000, Tejun Heo wrote:
> > There are two sites that nest rq lock inside scx_sched_lock:
> >
> > - scx_bypass() takes scx_sched_lock then rq lock per CPU to propagate
> > per-cpu bypass flags and re-enqueue tasks.
> >
> > - sysrq_handle_sched_ext_dump() takes scx_sched_lock to iterate all
> > scheds, scx_dump_state() then takes rq lock per CPU for dump.
> >
> > And scx_claim_exit() takes scx_sched_lock to propagate exits to
> > descendants. It can be reached from scx_tick(), BPF kfuncs, and many
> > other paths with rq lock already held, creating the reverse ordering:
> >
> > rq lock -> scx_sched_lock vs. scx_sched_lock -> rq lock
> >
> > Fix by flipping scx_bypass() to take rq lock first, and dropping
> > scx_sched_lock from sysrq_handle_sched_ext_dump() as scx_sched_all is
> > already RCU-traversable and scx_dump_lock now prevents dumping a dead
> > sched. This makes the consistent ordering rq lock -> scx_sched_lock.
> >
> > Reported-by: Cheng-Yang Chou <yphbchou0911@xxxxxxxxx>
> > Link: http://lkml.kernel.org/r/20260309163025.2240221-1-yphbchou0911@xxxxxxxxx
> > Fixes: ebeca1f930ea ("sched_ext: Introduce cgroup sub-sched support")
> > Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>
> > ---
> > kernel/sched/ext.c | 4 +---
> > 1 file changed, 1 insertion(+), 3 deletions(-)
> >
> > diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> > index cf28a8f62ad0..677c1c6c64bf 100644
> > --- a/kernel/sched/ext.c
> > +++ b/kernel/sched/ext.c
> > @@ -5097,8 +5097,8 @@ static void scx_bypass(struct scx_sched *sch, bool bypass)
> > struct rq *rq = cpu_rq(cpu);
> > struct task_struct *p, *n;
> >
> > - raw_spin_lock(&scx_sched_lock);
> > raw_spin_rq_lock(rq);
> > + raw_spin_lock(&scx_sched_lock);
> >
> > scx_for_each_descendant_pre(pos, sch) {
> > struct scx_sched_pcpu *pcpu = per_cpu_ptr(pos->pcpu, cpu);
> > @@ -7240,8 +7240,6 @@ static void sysrq_handle_sched_ext_dump(u8 key)
> > struct scx_exit_info ei = { .kind = SCX_EXIT_NONE, .reason = "SysRq-D" };
> > struct scx_sched *sch;
> >
> > - guard(raw_spinlock_irqsave)(&scx_sched_lock);
> > -
>
> Don't we need RCU protection here?

Nevermind, __handle_sysrq() is already doing rcu_read_lock/unlock(), so
this looks good.

Sorry for the noise,
-Andrea

>
> > list_for_each_entry_rcu(sch, &scx_sched_all, all)
> > scx_dump_state(sch, &ei, 0, false);
> > }
>
> Thanks,
> -Andrea