[PATCH RFC 0/4] sched/deadline: Add soft/reclaim mode via SCHED_OTHER demotion

From: Juri Lelli

Date: Thu Feb 19 2026 - 08:38:38 EST


Hi All,

This RFC introduces a bandwidth reclaiming mechanism for SCHED_DEADLINE
tasks through temporary demotion to SCHED_NORMAL when runtime is
exhausted. This resurrects and refines the demotion concept from the
original SCHED_DEADLINE development circa 2010, focusing exclusively on
SCHED_NORMAL demotion.

Discussions about the feature have been resurfacing over the years and I
wanted to check for feasibility and real interest. Found a little time
to play around with the idea and this is the result of that.

When a DEADLINE task with SCHED_FLAG_DL_DEMOTION exhausts its runtime
budget, the scheduler demotes it to SCHED_NORMAL rather than throttling
it until the next period. The task continues execution competing fairly
with other normal tasks, using the nice value specified in
sched_attr.sched_nice. At the next period boundary, the replenishment
timer automatically promotes the task back to SCHED_DEADLINE with a
fresh runtime budget.

This provides a "soft(er) real-time" mode where tasks get timing
guarantees when within budget but gracefully degrade to best-effort
execution during overruns rather than being suspended. The bandwidth
reservation remains in place during demotion, making the mechanism
transparent from an admission control perspective similar to throttling.

Key design aspects:

The implementation focuses solely on SCHED_NORMAL demotion, unlike
earlier proposals that suggested multiple demotion targets including RT
and DL postponement. Simpler and maybe enough?

The feature reuses the existing sched_attr.sched_nice field to specify
the nice value during demotion, avoiding new UAPI additions while
maintaining ABI compatibility. This is orthogonal to GRUB
(SCHED_FLAG_RECLAIM) - tasks can combine both mechanisms for
opportunistic reclaiming through accounting and continued execution
through demotion (at least in principle, didn't actually test it yet :).

Demoted tasks cannot migrate between CPUs. This simplification keeps
bandwidth accounting straightforward by ensuring the reservation stays
on the original CPU throughout demotion. Migration is re-enabled after
promotion or explicit parameter changes via sched_setattr().

The bandwidth accounting follows the throttling model rather than full
class switching. Dequeue operations omit DEQUEUE_SAVE to keep the
reservation in this_bw (admission control bandwidth). Running bandwidth
(enforcement) is handled at 0-lag time for tasks that sleep while
demoted, maintaining correct GRUB accounting.

Explicit sched_setattr() calls on demoted tasks cancel the demotion
state and perform full bandwidth cleanup including inactive timer
handling and cpuset tracking. The replenishment timer remains armed but
fires harmlessly when it detects the task is no longer DEADLINE.

This posting is very much experimental. I added AI generated tests
(included here just for reference) that helped checking a few cases
during implementation. However, I am quite sure I'm missing several
additional cases that can cause breakage. Test it at your own risk! :P

Based on original work by Dario Faggioli:
https://lore.kernel.org/lkml/1288334546.8661.161.camel@Palantir/

As always comments and questions are more than welcome.

Series also available at

git@xxxxxxxxxx:jlelli/linux.git upstream/deadline-demotion

Signed-off-by: Juri Lelli <juri.lelli@xxxxxxxxxx>
---
Juri Lelli (4):
sched/deadline: Implement reclaim/soft mode through SCHED_OTHER demotion
sched/doc: Document SCHED_DEADLINE demotion feature
DEBUG selftests/sched: Add tests for SCHED_DEADLINE demotion feature
DEBUG selftests/sched: Add simple demonstration of SCHED_DEADLINE demotion

Documentation/scheduler/sched-deadline.rst | 54 +++
include/linux/sched.h | 10 +
include/uapi/linux/sched.h | 4 +-
include/uapi/linux/sched/types.h | 8 +
kernel/sched/deadline.c | 213 +++++++++-
kernel/sched/fair.c | 8 +
kernel/sched/sched.h | 15 +-
kernel/sched/syscalls.c | 8 +
tools/testing/selftests/sched/.gitignore | 3 +
tools/testing/selftests/sched/Makefile | 4 +-
tools/testing/selftests/sched/README_dl_demotion | 83 ++++
tools/testing/selftests/sched/dl_demotion_demo.c | 239 +++++++++++
tools/testing/selftests/sched/dl_demotion_stress.c | 208 ++++++++++
tools/testing/selftests/sched/dl_demotion_test.c | 460 +++++++++++++++++++++
.../selftests/sched/run_dl_demotion_with_trace.sh | 71 ++++
15 files changed, 1382 insertions(+), 6 deletions(-)
---
base-commit: e34881c84c255bc300f24d9fe685324be20da3d1
change-id: 20260218-upstream-deadline-demotion-19511e741055

Best regards,
--
Juri Lelli <juri.lelli@xxxxxxxxxx>