Re: [PATCH 1/2] sched_ext: Auto-register/unregister dl_server reservations
From: Andrea Righi
Date: Thu May 28 2026 - 12:44:38 EST
Hi Peter,
On Thu, May 28, 2026 at 01:36:21PM +0200, Peter Zijlstra wrote:
> On Tue, May 26, 2026 at 06:42:48PM +0200, Andrea Righi wrote:
> > @@ -6187,10 +6190,34 @@ static void scx_root_disable(struct scx_sched *sch)
> > /*
> > * Invalidate all the rq clocks to prevent getting outdated
> > * rq clocks from a previous scx scheduler.
> > + *
> > + * Also re-balance the dl_server bandwidth reservations: detach
> > + * ext_server (no more sched_ext tasks) and reinstate fair_server if it
> > + * was previously detached because we were running in full mode.
> > + *
> > + * Unlike the enable path, this runs on a recovery path that cannot
> > + * fail, so we use dl_server_swap_bw() to atomically free ext_server's
> > + * bandwidth and reclaim it for fair_server under the same dl_b lock.
> > + *
> > + * The swap can still fail with -EBUSY if someone bumped ext_server's
> > + * runtime via debugfs between enable and disable; in that narrow case
> > + * both servers end up detached and we just WARN.
> > */
> > for_each_possible_cpu(cpu) {
> > struct rq *rq = cpu_rq(cpu);
> > +
> > scx_rq_clock_invalidate(rq);
> > +
> > + scoped_guard(rq_lock_irqsave, rq) {
> > + update_rq_clock(rq);
> > + if (was_switched_all) {
> > + if (WARN_ON_ONCE(dl_server_swap_bw(&rq->ext_server,
> > + &rq->fair_server)))
> > + pr_warn("failed to re-attach fair_server on CPU %d\n", cpu);
>
> One option here, with the swap, is to reduce the fair servers bandwidth
> to match the outgoing ext server. Then at least you end up with the fair
> server running, rather than having it completely stopped.
>
> But this is going to be a rather rare occurrence, and people will have
> to go poke at the debugfs controls anyway if this happens, so maybe
> that's just not worth the effort.
>
> But I wanted to mention it...
Yeah, it'd be safer to at least have "some" bandwidth attached if
dl_server_swap_bw() fails, so that fair isn't left completely unprotected.
On top of that we could even try to opportunistically restore the original
bandwidth whenever DL bw is released, but as you say, this is probably a rare
scenario, maybe it could be a later follow-up improvement?
>
> > + } else {
> > + dl_server_detach_bw(&rq->ext_server);
> > + }
> > + }
> > }
> >
> > /* no task is on scx, turn off all the switches and flush in-progress calls */
Thanks,
-Andrea