Re: [PATCH] Revert "sched/fair: Make sure to try to detach at least one movable task"
From: Peter Zijlstra
Date: Tue Jun 25 2024 - 04:37:44 EST
On Mon, Jun 24, 2024 at 11:18:59AM +0200, Vincent Guittot wrote:
> On Thu, 20 Jun 2024 at 23:45, Josh Don <joshdon@xxxxxxxxxx> wrote:
> >
> > This reverts commit b0defa7ae03ecf91b8bfd10ede430cff12fcbd06.
> >
> > b0defa7ae03ec changed the load balancing logic to ignore env.max_loop if
> > all tasks examined to that point were pinned. The goal of the patch was
> > to make it more likely to be able to detach a task buried in a long list
> > of pinned tasks. However, this has the unfortunate side effect of
> > creating an O(n) iteration in detach_tasks(), as we now must fully
> > iterate every task on a cpu if all or most are pinned. Since this load
> > balance code is done with rq lock held, and often in softirq context, it
> > is very easy to trigger hard lockups. We observed such hard lockups with
> > a user who affined O(10k) threads to a single cpu.
> >
> > When I discussed this with Vincent he initially suggested that we keep
> > the limit on the number of tasks to detach, but increase the number of
> > tasks we can search. However, after some back and forth on the mailing
> > list, he recommended we instead revert the original patch, as it seems
> > likely no one was actually getting hit by the original issue.
> >
>
> Maybe add a
> Fixes: b0defa7ae03e ("sched/fair: Make sure to try to detach at least
> one movable task")
>
> > Signed-off-by: Josh Don <joshdon@xxxxxxxxxx>
>
> Reviewed-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
Thanks guys, queued it for sched/urgent