Re: [PATCH] sched/deadline: Make dl-server nohz full aware

From: Juri Lelli

Date: Wed May 13 2026 - 02:16:38 EST


On 12/05/26 17:34, Juri Lelli wrote:
> Hi Andrea,
>
> On 12/05/26 16:55, Andrea Righi wrote:
> > Hi Juri,
>
> Thanks from the quick review!
>
> > On Tue, May 12, 2026 at 11:02:37AM +0200, Juri Lelli wrote:
> > > The dl_server_timer() causes spurious IPIs on nohz_full cores, breaking
> > > isolation guarantees. The timer executes on a housekeeping core and
> > > eventually calls tick_nohz_dep_set_cpu(), sending IPIs to isolated cores
> > > even when only a single task is running.
> > >
> > > The problem is that dl-servers are not coordinated with nohz_full tick
> > > state. Timers can fire and send IPIs to otherwise undisturbed cores.
> > >
> > > Fix by managing servers in sched_can_stop_tick():
> > >
> > > - When RT tasks run with CFS/SCX tasks, start the appropriate server
> > > and keep the tick running
> > > - When only RT tasks remain, stop all servers and allow tick to stop
> > > (except for >1 RR tasks which need the tick for round-robin)
> > > - When only CFS/SCX tasks remain, stop all servers before stopping tick
> > >
> > > Introduce dl_servers_stop_all() to reduce duplication and abstract
> > > server management from core.c. Unify RT handling into one block that
> > > handles both RR and FIFO cases.
> > >
> > > Fixes: 557a6bfc662c ("sched/fair: Add trivial fair server")
> > > Reported-by: David Haufe <dhaufe@xxxxxxxxxxxxxxxxxx>
> > > Closes: https://lore.kernel.org/lkml/CAKJHwtOw_G67edzuHVtL1xC5Vyt6StcZzihtDd0yaKudW=rwVw@xxxxxxxxxxxxxx
> > > Signed-off-by: Juri Lelli <juri.lelli@xxxxxxxxxx>
> > > ---
> > > I had to modify my first original attempt at fixing this (please take a
> > > look at the linked report/discussion) to also take SCX into
> > > consideration.
> >
> > As mentioned by Frederic, we don't allow to load BPF schedulers when isolcpus=
> > is used, so I think we can simplify the sched_can_stop_tick() part.
>
> Right! Thanks for confirming.

Ah, but wait. IIUC SCX is incopatible with isolcpus=domain only?
scx_can_stop_tick() seems to confirm we need to take care of it when
domain flag is not present.

So, maybe we still need to consider SCX in this patch? e.g. in
configurations that are not using static domain isolation, but isolate
CPUs by configuring tasks affinities.