Re: [PATCH RFC 1/4] sched/deadline: Implement reclaim/soft mode through SCHED_OTHER demotion

From: Peter Zijlstra

Date: Fri Feb 20 2026 - 14:47:29 EST

On Thu, Feb 19, 2026 at 02:37:34PM +0100, Juri Lelli wrote:
> Add support for demoting deadline tasks to SCHED_OTHER when they exhaust
> their runtime. This prevents starvation of lower priority tasks while still
> allowing deadline tasks to utilize available CPU bandwidth.
>
> This feature resurrects and refines the bandwidth reclaiming concept
> from the original SCHED_DEADLINE development (circa 2010), focusing on a
> single demotion mode: SCHED_OTHER.

Yeah, that's good enough for most I suppose. Demotion to FIFO/RR is
'weird' anyway.

> @@ -1419,6 +1444,84 @@ s64 dl_scaled_delta_exec(struct rq *rq, struct sched_dl_entity *dl_se, s64 delta
> return scaled_delta_exec;
> }
>
> +/*
> + * Check if a deadline task can be demoted when it exhausts its runtime.
> + * dl-servers and boosted tasks cannot be demoted.
> + *
> + * Returns true if demotion should happen, false otherwise.
> + */
> +static inline bool dl_task_can_demote(struct sched_dl_entity *dl_se)
> +{
> + if (dl_server(dl_se))
> + return false;
> +
> + if (is_dl_boosted(dl_se))
> + return false;
> +
> + return !!(dl_se->flags & SCHED_FLAG_DL_DEMOTION);

It is already implicitly cast to bool by virtue of the return value, no
need for that explicit !!.

> +}
> +
> +/*
> + * Promote a demoted task back to SCHED_DEADLINE.
> + * The task's runtime will be replenished by the caller.
> + */
> +static void dl_task_promote(struct rq *rq, struct task_struct *p)
> +{
> + struct sched_dl_entity *dl_se = &p->dl;
> + int queue_flags = DEQUEUE_MOVE | DEQUEUE_NOCLOCK | DEQUEUE_CLASS;
> +
> + lockdep_assert_rq_held(rq);
> +
> + if (dl_se->dl_demotion_state != DL_DEMOTED)
> + return;
> +
> + dl_se->dl_demotion_state = DL_PROMOTING;
> +
> + scoped_guard (sched_change, p, queue_flags) {
> + p->policy = SCHED_DEADLINE;
> + p->sched_class = &dl_sched_class;
> + p->prio = MAX_DL_PRIO - 1;
> + p->normal_prio = p->prio;
> + }
> +
> + dl_se->dl_demotion_state = DL_NOT_DEMOTED;
> +
> + __balance_callbacks(rq, NULL);
> +}
> +
> +/*
> + * Demote a deadline task to SCHED_OTHER when it exhausts its runtime.
> + * The task will be promoted back to SCHED_DEADLINE at replenish.
> + */
> +static void dl_task_demote(struct rq *rq, struct task_struct *p)
> +{
> + struct sched_dl_entity *dl_se = &p->dl;
> + int queue_flags = DEQUEUE_MOVE | DEQUEUE_NOCLOCK | DEQUEUE_CLASS;
> +
> + lockdep_assert_rq_held(rq);
> +
> + if (dl_se->dl_demotion_state != DL_NOT_DEMOTED || !dl_task_can_demote(dl_se))
> + return;
> +
> + dl_se->dl_demotion_state = DL_DEMOTING;
> +
> + scoped_guard (sched_change, p, queue_flags) {
> + /*
> + * The task's static_prio is already set from the sched_nice
> + * value in sched_attr.
> + */
> + p->policy = SCHED_NORMAL;
> + p->sched_class = &fair_sched_class;
> + p->prio = p->static_prio;
> + p->normal_prio = p->static_prio;
> + }
> +
> + dl_se->dl_demotion_state = DL_DEMOTED;
> +
> + __balance_callbacks(rq, NULL);
> + resched_curr(rq);

Doesn't sched_change already force resched on class degradation?

Anyway, I love how simple this has become ;-)

> +}

> +static void switched_from_dl(struct rq *rq, struct task_struct *p)
> +{
> + /*
> + * If demoting, skip all bandwidth accounting. The bandwidth
> + * reservation stays in place while the task executes as SCHED_NORMAL.
> + */
> + if (p->dl.dl_demotion_state == DL_DEMOTING)
> + return;
> +
> + __dl_cleanup_bandwidth(p, rq);
>
> /*
> * Since this might be the only -deadline task on the rq,
> @@ -3322,6 +3471,16 @@ static void switched_from_dl(struct rq *rq, struct task_struct *p)
> */
> static void switched_to_dl(struct rq *rq, struct task_struct *p)
> {
> + /*
> + * If promoting from demotion, skip bandwidth/cpuset accounting.
> + */
> + if (p->dl.dl_demotion_state == DL_PROMOTING) {
> + if (!task_on_rq_queued(p))
> + return;
> +
> + goto check_preempt;
> + }
> +
> cancel_inactive_timer(&p->dl);
>
> /*

Ah, I wondered where you'd need those DEMOTING/PROMOTING states.

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index c16b5fd71b2d5..59e5459a75492 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9415,6 +9415,14 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
> if (p->sched_task_hot)
> p->sched_task_hot = 0;
>
> + /*
> + * Demoted DEADLINE tasks cannot migrate. Their bandwidth reservation
> + * is tied to the demotion CPU and will be released when the task is
> + * promoted back to DEADLINE or explicitly switched to another policy.
> + */
> + if (!dl_task_can_migrate(p))
> + return 0;

I suppose this works, the alternative is doing migrate_disable() in
demote and migrate_enable() in promote. Not quite sure which is the
least horrible in this case :-)