[PATCH] sched/deadline: Fix stale dl_defer_running in dl_server else-branch

From: soolaugust

Date: Thu Apr 02 2026 - 09:40:17 EST


From: Zhidao Su <suzhidao@xxxxxxxxxx>

Peter's fix (115135422562) cleared dl_defer_running in the if-branch of
update_dl_entity() (deadline expired/overflow). This ensures
replenish_dl_new_period() always arms the zero-laxity timer. However,
with PROXY_WAKING, re-activation hits the else-branch (same-period,
deadline not expired), where dl_defer_running from a prior starvation
episode can be stale.

During PROXY_WAKING CPU return-migration, proxy_force_return() migrates
the task to a new CPU via deactivate_task()+attach_one_task(). The
enqueue path on the new CPU triggers enqueue_task_fair() which calls
dl_server_start() for the fair_server. Crucially, this re-activation
does NOT call dl_server_stop() first, so dl_defer_running retains its
prior value. If a prior starvation episode left dl_defer_running=1,
and the server is re-activated within the same period:

[4] D->A: dl_server_stop() clears flags but may be skipped when
dl_server_active=0 (server was already stopped before
return-migration triggered dl_server_start())
[1] A->B: dl_server_start() -> enqueue_dl_entity(WAKEUP)
-> update_dl_entity() enters else-branch
-> 'if (!dl_defer_running)' guard fires, skips
dl_defer_armed=1 / dl_throttled=1
-> server enqueued into [D] state directly
-> update_curr_dl_se() consumes runtime
-> start_dl_timer() with dl_defer_armed=0 (slow path)
-> boot time increases ~72%

Fix: in the else-branch, unconditionally clear dl_defer_running and always
set dl_defer_armed=1 / dl_throttled=1. This ensures every same-period
re-activation properly re-arms the zero-laxity timer, regardless of whether
a prior starvation episode had set dl_defer_running.

The if-branch (deadline expired) is left untouched:
replenish_dl_new_period() contains its own guard ('if (!dl_defer_running)')
that arms the zero-laxity timer only when dl_defer_running=0. With
PROXY_WAKING, dl_defer_running=1 in the deadline-expired path means a
genuine starvation episode is ongoing, so the server can skip the
zero-laxity wait and enter [D] directly. Clearing dl_defer_running here
(as Peter's fix did) forces every PROXY_WAKING deadline-expired
re-activation through the ~950ms zero-laxity wait.

Measured boot time to first ksched_football event (4 CPUs, 4G):
This fix: ~15-20s
Without fix (stale dl_defer_running): ~43-62s (+72-200%)

Note: Andrea Righi's v2 patch addresses the same symptom by clearing
dl_defer_running in dl_server_stop(). However, dl_server_stop() is not
called during PROXY_WAKING return-migration (proxy_force_return() calls
dl_server_start() directly without dl_server_stop()). This fix targets
the correct location: the else-branch of update_dl_entity().

Signed-off-by: Zhidao Su <suzhidao@xxxxxxxxxx>
---
kernel/sched/deadline.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 01754d699f0..b2bcd34f3ea 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1034,22 +1034,22 @@ static void update_dl_entity(struct sched_dl_entity *dl_se)
return;
}

- /*
- * When [4] D->A is followed by [1] A->B, dl_defer_running
- * needs to be cleared, otherwise it will fail to properly
- * start the zero-laxity timer.
- */
- dl_se->dl_defer_running = 0;
replenish_dl_new_period(dl_se, rq);
} else if (dl_server(dl_se) && dl_se->dl_defer) {
/*
- * The server can still use its previous deadline, so check if
- * it left the dl_defer_running state.
+ * The server can still use its previous deadline. Clear
+ * dl_defer_running unconditionally: a stale dl_defer_running=1
+ * from a prior starvation episode (set in dl_server_timer() when
+ * the zero-laxity timer fires) must not carry over to the next
+ * activation. PROXY_WAKING return-migration (proxy_force_return)
+ * re-activates the server via attach_one_task()->enqueue_task_fair()
+ * without calling dl_server_stop() first, so the flag is not
+ * cleared in the [4] D->A path for that case.
+ * Always re-arm the zero-laxity timer on each re-activation.
*/
- if (!dl_se->dl_defer_running) {
- dl_se->dl_defer_armed = 1;
- dl_se->dl_throttled = 1;
- }
+ dl_se->dl_defer_running = 0;
+ dl_se->dl_defer_armed = 1;
+ dl_se->dl_throttled = 1;
}
}

--
2.43.0