Re: [PATCH sched/core] sched/rt: Fix RT_PUSH_IPI soft lockup loop
From: Valentin Schneider
Date: Tue May 12 2026 - 16:10:32 EST
On 12/05/26 11:37, Steven Rostedt wrote:
> [ Adding some RT folks ]
>
> Also, Valentin, can you look at this, because I believe the issue was
> introduced by your change (see below).
>
Woops!
> IIRC, the test we had was simply cyclictest that we ran with the following
> parameters. From commit b6366f048e0ca ("sched/rt: Use IPI to trigger RT
> task push migration instead of pulling"), it states it runs:
>
> cyclictest --numa -p95 -m -d0 -i100
>
> The above runs a thread on each CPU at priority 95 and will sleep for
> 100us. Each thread should wake up at the same time. You can read the commit
> message for more details but the tl;dr; is that without the IPI push
> request, if one of the CPUs ran another RT task besides cyclictest, then
> all the others would then ask to pull from it when the other CPUs
> cyclictest would sleep. Having over 100 CPUs send an IPI to pull a task
> when only the first one would get it, caused a large latency. Especially
> since it took the rq lock over and over again.
>
> But, the code being fixed wasn't due to that commit, but due to the commit
> that added the short cut of the logic. That commit fixes a race with the
> normal call to push_rt_task() and I think the pull logic issue was a side
> effect.
>
> I agree with Tejun's change, it actually puts the logic for the IPI pull
> back to what it was before commit 49bef33e4b87b. The bug was added by the
> shortcut case to push_rt_task() that was only meant for the !pull scenario.
> Adding !pull to the if conditional seems like the correct change.
>
> Valentin, can you confirm please.
>
So looking back at the original report for my patch:
https://lore.kernel.org/all/Yb3vXx3DcqVOi+EA@donbot/
the splat happened through rto_push_irq_work_func(), i.e. with pull=true
(that naming always causes me to shuffle through my notes; AFAICT that's
because it's when push_rt_task() is invoked due to a pull_rt_task() call
but urgh).
So IIUC I'm afraid the suggested fix would cause the original issue to
resurface, but that still leaves us with the reported softlock issue. I
don't have any inspiration so far, I'll sleep on it.
> Please update the Fixes tag to point to the appropriate commit as well as
> update the change log. With that:
>
> Reviewed-by: Steven Rostedt <rostedt@xxxxxxxxxxx>
>
> -- Steve