Re: [PATCH sched/core] sched/rt: Fix RT_PUSH_IPI soft lockup loop

From: Steven Rostedt

Date: Tue May 12 2026 - 17:35:02 EST


On Tue, 12 May 2026 08:07:58 -1000
Tejun Heo <tj@xxxxxxxxxx> wrote:

> Hello,
>
> Looking at 49bef33e4b87 ("sched/rt: Plug rt_mutex_setprio() vs
> push_rt_task() race"), the prio bail looks like it was already there
> and only got moved up to retry:. For non-migration-disabled next_task
> the bail fires at the same effective point both before and after, and
> rto_push_irq_work_func() + rto_next_cpu() were already in their
> current shape, so the loop seems reachable before the move too -
> b6366f048e0c ("sched/rt: Use IPI to trigger RT task push migration
> instead of pulling") looks like the actual origin.
>
> Am I reading it wrong?
>

No, I missed the movement of that code. Which means I need to understand
the problem better.

I'm still wondering about the trigger of this. That shortcut means the
current process is of lower priority than the waiting tasks and a simple
schedule should happen. From your tests, can you see why a lower process
was running on the CPU instead of a higher priority process?

Also, the IPIs only happen when another CPU is about to schedule something
of lower priority where it tries to pull a task to it.

From your description, you are seeing a storm of IPIs from all these CPUs
before the first CPU could return from hard interrupt and schedule?

I'm thinking there may be something else wrong here.

Note, the RT_PUSH_IPI logic only has a single iteration happening. If it is
happening and another CPU wants to do a "push", it simply ups the counter
to try again. It doesn't send another IPI.

Do you have a trace that shows what is happening?

# trace-cmd start -e sched_switch -e sched_waking -e irq -e workqueue
# echo 1 > /proc/sys/kernel/traceoff_on_warning
# trace-cmd extract

may be enough.

May need to add some trace_printk()s into the IPI logic code too.

-- Steve