[PATCH] sched/deadline: Fix stale dl_defer_running in update_dl_entity() if-branch
From: soolaugust
Date: Fri Apr 03 2026 - 04:12:56 EST
From: zhidao su <suzhidao@xxxxxxxxxx>
commit 115135422562 ("sched/deadline: Fix 'stuck' dl_server") added a
dl_defer_running = 0 reset in the if-branch of update_dl_entity() to
handle the case where [4] D->A is followed by [1] A->B (lapsed
deadline). The intent was to ensure the server re-enters the zero-laxity
wait when restarted after the deadline has passed.
With Proxy Execution (PE), RT tasks proxied through the scheduler appear
to trigger frequent dl_server_start() calls with expired deadlines. When
this happens with dl_defer_running=1 (from a prior starvation episode),
Peter's fix forces the fair_server back through the ~950ms zero-laxity
wait each time.
In our testing (virtme-ng, 4 CPUs, 4G RAM, ksched_football):
With this fix: ~1s for all players to check in
Without this fix: ~28s for all players to check in
The issue appears to be that the clearing in update_dl_entity()'s
if-branch is too aggressive for the PE use case.
replenish_dl_new_period() already handles this via its internal guard:
if (dl_se->dl_defer && !dl_se->dl_defer_running) {
dl_se->dl_throttled = 1;
dl_se->dl_defer_armed = 1;
}
When dl_defer_running=1 (starvation previously confirmed by the
zero-laxity timer), replenish_dl_new_period() skips arming the
zero-laxity timer, allowing the server to run directly. This seems
correct: once starvation has been confirmed, subsequent start/stop
cycles triggered by PE should not re-introduce the deferral delay.
Note: this is the same change as the HACK revert in John's PE series
(679ede58445 "HACK: Revert 'sched/deadline: Fix stuck dl_server'"),
but with the rationale documented.
The state machine comment is updated to reflect the actual behavior of
replenish_dl_new_period() when dl_defer_running=1.
Signed-off-by: zhidao su <suzhidao@xxxxxxxxxx>
---
kernel/sched/deadline.c | 12 +++---------
1 file changed, 3 insertions(+), 9 deletions(-)
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 01754d699f0..30b03021fce 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1034,12 +1034,6 @@ static void update_dl_entity(struct sched_dl_entity *dl_se)
return;
}
- /*
- * When [4] D->A is followed by [1] A->B, dl_defer_running
- * needs to be cleared, otherwise it will fail to properly
- * start the zero-laxity timer.
- */
- dl_se->dl_defer_running = 0;
replenish_dl_new_period(dl_se, rq);
} else if (dl_server(dl_se) && dl_se->dl_defer) {
/*
@@ -1662,11 +1656,11 @@ void dl_server_update(struct sched_dl_entity *dl_se, s64 delta_exec)
* enqueue_dl_entity()
* update_dl_entity(WAKEUP)
* if (dl_time_before() || dl_entity_overflow())
- * dl_defer_running = 0;
* replenish_dl_new_period();
* // fwd period
- * dl_throttled = 1;
- * dl_defer_armed = 1;
+ * if (!dl_defer_running)
+ * dl_throttled = 1;
+ * dl_defer_armed = 1;
* if (!dl_defer_running)
* dl_defer_armed = 1;
* dl_throttled = 1;
--
2.43.0