Re: [PATCH] sched/deadline: Fix stale dl_defer_running in update_dl_entity() if-branch

From: John Stultz

Date: Fri Apr 03 2026 - 15:37:19 EST

On Fri, Apr 3, 2026 at 6:43 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Fri, Apr 03, 2026 at 04:12:15PM +0800, soolaugust@xxxxxxxxx wrote:
> > From: zhidao su <suzhidao@xxxxxxxxxx>
> >
> > commit 115135422562 ("sched/deadline: Fix 'stuck' dl_server") added a
> > dl_defer_running = 0 reset in the if-branch of update_dl_entity() to
> > handle the case where [4] D->A is followed by [1] A->B (lapsed
> > deadline). The intent was to ensure the server re-enters the zero-laxity
> > wait when restarted after the deadline has passed.
> >
> > With Proxy Execution (PE), RT tasks proxied through the scheduler appear
> > to trigger frequent dl_server_start() calls with expired deadlines. When
> > this happens with dl_defer_running=1 (from a prior starvation episode),
> > Peter's fix forces the fair_server back through the ~950ms zero-laxity
> > wait each time.
> >
> > In our testing (virtme-ng, 4 CPUs, 4G RAM, ksched_football):
> > With this fix: ~1s for all players to check in
> > Without this fix: ~28s for all players to check in
> >
> > The issue appears to be that the clearing in update_dl_entity()'s
> > if-branch is too aggressive for the PE use case.
> > replenish_dl_new_period() already handles this via its internal guard:
> >
> > if (dl_se->dl_defer && !dl_se->dl_defer_running) {
> > dl_se->dl_throttled = 1;
> > dl_se->dl_defer_armed = 1;
> > }
> >
> > When dl_defer_running=1 (starvation previously confirmed by the
> > zero-laxity timer), replenish_dl_new_period() skips arming the
> > zero-laxity timer, allowing the server to run directly. This seems
> > correct: once starvation has been confirmed, subsequent start/stop
> > cycles triggered by PE should not re-introduce the deferral delay.
> >
> > Note: this is the same change as the HACK revert in John's PE series
> > (679ede58445 "HACK: Revert 'sched/deadline: Fix stuck dl_server'"),
> > but with the rationale documented.
> >
> > The state machine comment is updated to reflect the actual behavior of
> > replenish_dl_new_period() when dl_defer_running=1.
> >
> > Signed-off-by: zhidao su <suzhidao@xxxxxxxxxx>
> > ---
> > kernel/sched/deadline.c | 12 +++---------
> > 1 file changed, 3 insertions(+), 9 deletions(-)
> >
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index 01754d699f0..30b03021fce 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -1034,12 +1034,6 @@ static void update_dl_entity(struct sched_dl_entity *dl_se)
> > return;
> > }
> >
> > - /*
> > - * When [4] D->A is followed by [1] A->B, dl_defer_running
> > - * needs to be cleared, otherwise it will fail to properly
> > - * start the zero-laxity timer.
> > - */
> > - dl_se->dl_defer_running = 0;
> > replenish_dl_new_period(dl_se, rq);
> > } else if (dl_server(dl_se) && dl_se->dl_defer) {
> > /*
>
> This cannot be right; it will insta break Andrea's test case again.
>
> And I cannot make sense of your explanation; how does PE cause what to
> happen? You mention PROXY_WAKING, this then means proxy_force_return().
>
> I suspect whatever it is you're seeing will go away once we delete that
> thing, see this discussion:
>
> https://lkml.kernel.org/r/20260402155055.GV3738010@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>

So unfortunately, this doesn't seem to be proxy-exec related at all.

It's almost identical to the issue I had awhile back with the
dl_server, when spawning RT spinner threads threads (as kthreadd
doesn't run as RT).
https://lore.kernel.org/lkml/CANDhNCqK3VBAxxWMsDez8xkX0vcTStWjRMR95pksUM6Q26Ctyw@xxxxxxxxxxxxxx/

Now, this is with my out-of-tree ksched_football test, which is a bit
quirky. I've updated a branch with my test here against 7.0-rc5:
https://github.com/johnstultz-work/linux-dev/commits/ksched-football-dl_server-issue/

It runs at boot, but you can also re-run it via "echo 10 >
/sys/kernel/ksched_football/start_game"

The idea in the test is there is a high priority "Ref" thread that
spawns players from low priority to high that just spin on the cpu.
The issue is once NR_CPU low-prio players start, they starve
additional higher-prio players from starting (despite the highest
priority Ref spawning them) because kthreadd is not RT. So the test
effectively relies on the dl_server to kick in and let the rest of the
players spawn. This isn't actually what the test is testing, but just
how it gets ready to run the test.

Using a 8 cpu VM with CONFIG_SCHED_PROXY_EXEC disabled:

With commit 115135422562 ("sched/deadline: Fix 'stuck' dl_server")
reverted, I see the (expected, maybe) behavior where the starvation
lasts ~1second, then dl_server allows all the threads to spawn right
away, and then the test runs for 10 seconds.

See perfetto chart:
https://ui.perfetto.dev/#!/?s=a729fd2dd4b224d6335c5b2e727dc1a1c302c11a
(click the Kernel-threads track and scroll down to see the test
threads named referee/defense/offense/crazy-fan)

With commit 115135422562 ("sched/deadline: Fix 'stuck' dl_server")
applied, it seems the dl_server boosting the kthreadd spawning is much
more staggered. Again we spin up NR_CPU low priority threads, and
there's ~1second of starvation, then we spawn one of the mid threads,
and another second delay, then there's a two second delay befofe we
get the third running, then we get a small burst of 5 threads at once,
then it falls back to 1 second or more per thread as it spawns off the
rest. All in all it takes ~44 seconds just to spawn the threads before
running the test.

Perfetto chart:
https://ui.perfetto.dev/#!/?s=ab8e487375d0c82ceea478ee4534a7189269c0d4

With higher cpu counts (64), the test effectively prevents the system
from booting (trips the hung task watchdog).

I haven't really diagnosed the issue, but it feels a little like the
dl_server is boosting until the fair rq is empty but then giving up
the rest of its time, so if a fair task runs repeatedly but for a very
short period of time, it won't get to run again until the next
dl_server period? Causing this rate-limiting one-task-per-second
effect for thread spawning? I still need to stare at the dl_server
logic some more.

thanks
-john