Re: [PATCH RFC 0/4] sched/deadline: Add soft/reclaim mode via SCHED_OTHER demotion

From: Qais Yousef

Date: Sun Apr 19 2026 - 16:58:59 EST


On 02/19/26 14:37, Juri Lelli wrote:
> Hi All,
>
> This RFC introduces a bandwidth reclaiming mechanism for SCHED_DEADLINE
> tasks through temporary demotion to SCHED_NORMAL when runtime is
> exhausted. This resurrects and refines the demotion concept from the
> original SCHED_DEADLINE development circa 2010, focusing exclusively on
> SCHED_NORMAL demotion.
>
> Discussions about the feature have been resurfacing over the years and I
> wanted to check for feasibility and real interest. Found a little time
> to play around with the idea and this is the result of that.
>
> When a DEADLINE task with SCHED_FLAG_DL_DEMOTION exhausts its runtime
> budget, the scheduler demotes it to SCHED_NORMAL rather than throttling
> it until the next period. The task continues execution competing fairly
> with other normal tasks, using the nice value specified in
> sched_attr.sched_nice. At the next period boundary, the replenishment
> timer automatically promotes the task back to SCHED_DEADLINE with a
> fresh runtime budget.
>
> This provides a "soft(er) real-time" mode where tasks get timing
> guarantees when within budget but gracefully degrade to best-effort
> execution during overruns rather than being suspended. The bandwidth
> reservation remains in place during demotion, making the mechanism
> transparent from an admission control perspective similar to throttling.

I think this can be useful for IPC like binder. Sadly binder can be used
excessively even when not necessary, which can easily add more overhead.

If we can use DL to give them 0.25-0.5ms chance to finish quickly otherwise
demote them to fair, that might be an interesting experiment.

Adding Carlos and Alice in case they're interested in looking at this ;-)

If the patches can be merged, it'd be easier to backport and construct an
experiment in general.

(once globbing is available constructing such experiments with schedqos would
be easy)


Thanks

--
Qais Yousef

>
> Key design aspects:
>
> The implementation focuses solely on SCHED_NORMAL demotion, unlike
> earlier proposals that suggested multiple demotion targets including RT
> and DL postponement. Simpler and maybe enough?
>
> The feature reuses the existing sched_attr.sched_nice field to specify
> the nice value during demotion, avoiding new UAPI additions while
> maintaining ABI compatibility. This is orthogonal to GRUB
> (SCHED_FLAG_RECLAIM) - tasks can combine both mechanisms for
> opportunistic reclaiming through accounting and continued execution
> through demotion (at least in principle, didn't actually test it yet :).
>
> Demoted tasks cannot migrate between CPUs. This simplification keeps
> bandwidth accounting straightforward by ensuring the reservation stays
> on the original CPU throughout demotion. Migration is re-enabled after
> promotion or explicit parameter changes via sched_setattr().
>
> The bandwidth accounting follows the throttling model rather than full
> class switching. Dequeue operations omit DEQUEUE_SAVE to keep the
> reservation in this_bw (admission control bandwidth). Running bandwidth
> (enforcement) is handled at 0-lag time for tasks that sleep while
> demoted, maintaining correct GRUB accounting.
>
> Explicit sched_setattr() calls on demoted tasks cancel the demotion
> state and perform full bandwidth cleanup including inactive timer
> handling and cpuset tracking. The replenishment timer remains armed but
> fires harmlessly when it detects the task is no longer DEADLINE.
>
> This posting is very much experimental. I added AI generated tests
> (included here just for reference) that helped checking a few cases
> during implementation. However, I am quite sure I'm missing several
> additional cases that can cause breakage. Test it at your own risk! :P
>
> Based on original work by Dario Faggioli:
> https://lore.kernel.org/lkml/1288334546.8661.161.camel@Palantir/
>
> As always comments and questions are more than welcome.
>
> Series also available at
>
> git@xxxxxxxxxx:jlelli/linux.git upstream/deadline-demotion
>
> Signed-off-by: Juri Lelli <juri.lelli@xxxxxxxxxx>
> ---
> Juri Lelli (4):
> sched/deadline: Implement reclaim/soft mode through SCHED_OTHER demotion
> sched/doc: Document SCHED_DEADLINE demotion feature
> DEBUG selftests/sched: Add tests for SCHED_DEADLINE demotion feature
> DEBUG selftests/sched: Add simple demonstration of SCHED_DEADLINE demotion
>
> Documentation/scheduler/sched-deadline.rst | 54 +++
> include/linux/sched.h | 10 +
> include/uapi/linux/sched.h | 4 +-
> include/uapi/linux/sched/types.h | 8 +
> kernel/sched/deadline.c | 213 +++++++++-
> kernel/sched/fair.c | 8 +
> kernel/sched/sched.h | 15 +-
> kernel/sched/syscalls.c | 8 +
> tools/testing/selftests/sched/.gitignore | 3 +
> tools/testing/selftests/sched/Makefile | 4 +-
> tools/testing/selftests/sched/README_dl_demotion | 83 ++++
> tools/testing/selftests/sched/dl_demotion_demo.c | 239 +++++++++++
> tools/testing/selftests/sched/dl_demotion_stress.c | 208 ++++++++++
> tools/testing/selftests/sched/dl_demotion_test.c | 460 +++++++++++++++++++++
> .../selftests/sched/run_dl_demotion_with_trace.sh | 71 ++++
> 15 files changed, 1382 insertions(+), 6 deletions(-)
> ---
> base-commit: e34881c84c255bc300f24d9fe685324be20da3d1
> change-id: 20260218-upstream-deadline-demotion-19511e741055
>
> Best regards,
> --
> Juri Lelli <juri.lelli@xxxxxxxxxx>
>