Re: [PATCH v2] sched/rt: Push RT tasks when preempted by a deadline task
From: yangsonghua
Date: Fri Jun 26 2026 - 03:13:07 EST
Hi Prateek,
Thank you for the thorough review!
You are right about the task_is_blocked() early return — the blocked-task
case does not create new migration opportunities, since balance was already
attempted when p was last scheduled in, and a DL arrival on this CPU does
not change the migrability of tasks already sitting in pushable_tasks.
I will restore the early return in v3.
Will send v3 shortly.
Best regards,
Yangsonghua
K Prateek Nayak <kprateek.nayak@xxxxxxx> 于2026年6月26日周五 13:52写道:
>
> Hello Yangsonghua,
>
> On 6/26/2026 8:53 AM, yangsonghua wrote:
> > [You don't often get email from jluyangsonghua@xxxxxxxxx. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> >
> > Commit 704069649b5b ("sched/core: Rework sched_class::wakeup_preempt()
> > and rq_modified_*()") made wakeup_preempt_rt() callable for cross-class
> > wakeups, leaving a "XXX If we're preempted by DL, queue a push?" comment
> > with no implementation.
> >
> > When a SCHED_DEADLINE task preempts a running SCHED_FIFO/SCHED_RR task,
> > the RT class's put_prev_task_rt() is the natural place to trigger a push:
> > at that point the preempted task is (conditionally) added to
> > pushable_tasks, and we have 'next' available to identify the reason for
> > preemption.
> >
> > The earlier wakeup_preempt_rt() call site is too early: the running RT
> > task has not yet been added to pushable_tasks, so rt_queue_push_tasks()
> > would be a no-op for the common single-RT-task-per-CPU case.
> >
> > Guard the dl_task(next) check with a NULL test. put_prev_task_rt() is
> > reachable with next == NULL when called via the bare put_prev_task()
> > wrapper (sched.h), which always passes NULL:
> >
> > sched_change_begin()
> > put_prev_task(rq, p) /* next = NULL */
> > put_prev_task_rt(rq, p, NULL)
> > dl_task(NULL) -> NULL deref /* BUG in v1 */
> >
> > The actual DL-preemption path goes through put_prev_set_next_task(),
> > which supplies the real next task, so the NULL guard does not change
> > the intended trigger condition.
> >
> > Regarding potentially unnecessary push/pull cycles for short-lived DL
> > tasks: this is an inherent property of any asynchronous push mechanism.
> > rt_queue_push_tasks() enqueues a balance callback rather than migrating
> > synchronously, so by the time it fires the DL task may have already
> > yielded; in that case the RT task may need to be pulled back. This
> > trade-off is acceptable: leaving RT tasks stranded on a busy CPU
> > while peer CPUs sit idle is worse than an occasional redundant
> > migration. The same concern applies to the existing triggers of
> > rt_queue_push_tasks() (e.g. after dequeue_task_rt), so this path is
> > consistent with the established policy.
> >
> > Fix this by adding a dl_task(next) check in put_prev_task_rt(), guarded
> > against NULL. This is intentionally placed outside and after the
> > enqueue_pushable_task() condition, so that a push is triggered even when
> > the preempted RT task is pinned (nr_cpus_allowed == 1) or blocked
> > (proxy-exec): in those cases there may be other migratable RT tasks
> > already sitting in pushable_tasks.
>
> That cannot be the case. More details below.
>
> >
> > The task_is_blocked() early-return is folded into the
> > enqueue_pushable_task() condition to avoid inadvertently suppressing the
> > new push logic.
> >
> > Note: this does not cover the DL-server case where a fair-class task
> > running under DL bandwidth budget displaces an RT task; that is a
> > separate concern.
> >
> > Signed-off-by: yangsonghua <yangsonghua@xxxxxxxxxxx>
> > ---
> > Changes in v2:
> > - Guard dl_task(next) with a 'next != NULL' check to fix a NULL pointer
> > dereference when put_prev_task_rt() is called from sched_change_begin()
> > via the put_prev_task() wrapper, which always passes next = NULL.
> > - Extend the commit message to explain the NULL call path and address the
> > reviewer concern about unnecessary push/pull cycles for short-lived DL
> > tasks.
> > - Improve wakeup_preempt_rt() comment to explicitly point readers to
> > put_prev_task_rt() where the DL push is now handled.
> >
> > kernel/sched/rt.c | 24 +++++++++++++++++++-----
> > 1 file changed, 19 insertions(+), 5 deletions(-)
> >
> > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> > index e474c31d8fe6..ef8cdc4c25d2 100644
> > --- a/kernel/sched/rt.c
> > +++ b/kernel/sched/rt.c
> > @@ -1627,7 +1627,10 @@ static void wakeup_preempt_rt(struct rq *rq, struct task_struct *p, int flags)
> > struct task_struct *donor = rq->donor;
> >
> > /*
> > - * XXX If we're preempted by DL, queue a push?
> > + * If the incoming task belongs to a higher-priority class (e.g. DL),
> > + * we only handle same-class RT preemption here. The DL case is
> > + * handled in put_prev_task_rt() once the current task has been
> > + * added to pushable_tasks.
> > */
> > if (p->sched_class != &rt_sched_class)
> > return;
> > @@ -1736,14 +1739,25 @@ static void put_prev_task_rt(struct rq *rq, struct task_struct *p, struct task_s
> >
> > update_rt_rq_load_avg(rq_clock_pelt(rq), rq, 1);
> >
> > - if (task_is_blocked(p))
> > - return;
>
> I still don't get why we are removing this early return. When "p" was
> selected to run, it would have already queued a balance callback via
> set_nex_task_rt() and the balance would have been done during last
> __schedule().
>
> If there are pushable tasks on the list still, they'll still remain
> non-pushable since they haven't run on CPU and nothing has changed for
> them to suddenly enable a movement.
>
> "p" itself is a blocked task so it cannot be moved - it'll simply
> follow the lock owner via proxy migration path whenever it gets a
> chance.
>
> So the gist is: If there are pushable tasks, then balance failed to
> move them previously. If the state of the system had changed,
> balance_rt() on a remote CPU would have already pulled the highest
> priority task from this rq.
>
> I feel that early return for blocked task should stay as is since
> there is no opportunity for balance yet unless I'm missing something.
>
> > /*
> > * The previous task needs to be made eligible for pushing
> > - * if it is still active
> > + * if it is still active and migratable.
> > */
> > - if (on_rt_rq(&p->rt) && p->nr_cpus_allowed > 1)
> > + if (!task_is_blocked(p) && on_rt_rq(&p->rt) && p->nr_cpus_allowed > 1)
> > enqueue_pushable_task(rq, p);
> > +
> > + /*
> > + * When a deadline task takes over this CPU, try to push any queued
> > + * RT tasks to CPUs running lower-priority work. This is independent
> > + * of whether p itself is pushable: even if p is pinned or blocked,
> > + * there may be other migratable RT tasks already in pushable_tasks.
> > + *
> > + * next is NULL when called from put_prev_task() (e.g. sched_change_begin);
> > + * guard accordingly. rt_queue_push_tasks() checks has_pushable_tasks()
> > + * internally, so this is a no-op if nothing is queued.
> > + */
> > + if (next && dl_task(next))
> > + rt_queue_push_tasks(rq);
>
> Based on some quick look at the cpupri bits, I think this makes sense.
>
> Since we try to move "p" to a CPU running a lower priority task and a
> deadline task would have increased the priority of this CPU anyways, it
> is fair to attempt a push.
>
> I'll let Steve finish his metamorphosis and see if that estimate is
> wrong or not ;-)
>
> > }
> >
> > /* Only try algorithms three times */
> > --
> > 2.34.1
> >
>
> --
> Thanks and Regards,
> Prateek
>