Re: [PATCH -v2 15/17] sched: Fix migrate_disable() vs rt/dl balancing

From: Juri Lelli
Date: Tue Oct 06 2020 - 10:37:14 EST


On 06/10/20 15:48, Peter Zijlstra wrote:
> On Tue, Oct 06, 2020 at 12:20:43PM +0100, Valentin Schneider wrote:
> >
> > On 05/10/20 15:57, Peter Zijlstra wrote:
> > > In order to minimize the interference of migrate_disable() on lower
> > > priority tasks, which can be deprived of runtime due to being stuck
> > > below a higher priority task. Teach the RT/DL balancers to push away
> > > these higher priority tasks when a lower priority task gets selected
> > > to run on a freshly demoted CPU (pull).

Still digesting the whole lot, but can't we "simply" force push the
higest prio (that we preempt to make space for the migrate_disabled
lower prio) directly to the cpu that would accept the lower prio that
cannot move?

Asking because AFAIU we are calling find_lock_rq from push_cpu_stop and
that selects the best cpu for the high prio. I'm basically wondering if
we could avoid moving, potentially multiple, high prio tasks around to
make space for a lower prio task.

> > > This adds migration interference to the higher priority task, but
> > > restores bandwidth to system that would otherwise be irrevocably lost.
> > > Without this it would be possible to have all tasks on the system
> > > stuck on a single CPU, each task preempted in a migrate_disable()
> > > section with a single high priority task running.
> > >
> > > This way we can still approximate running the M highest priority tasks
> > > on the system.
> > >
> >
> > Ah, so IIUC that's the important bit that makes it we can't just say go
> > through the pushable_tasks list and skip migrate_disable() tasks.
> >
> > Once the highest-prio task exits its migrate_disable() region, your patch
> > pushes it away. If we ended up with a single busy CPU, it'll spread the
> > tasks around one migrate_enable() at a time.
> >
> > That time where the top task is migrate_disable() is still a crappy time,
> > and as you pointed out earlier today if it is a genuine pcpu task then the
> > whole thing is -EBORKED...
> >
> > An alternative I could see would be to prevent those piles from forming
> > altogether, say by issuing a similar push_cpu_stop() on migrate_disable()
> > if the next pushable task is already migrate_disable(); but that's a
> > proactive approach whereas yours is reactive, so I'm pretty sure that's
> > bound to perform worse.
>
> I think it is always possible to form pileups. Just start enough tasks
> such that newer, higher priority, tasks have to preempt existing tasks.
>
> Also, we might not be able to place the task elsewhere, suppose we have
> all our M CPUs filled with an RT task, then when the lowest priority
> task has migrate_disable(), wake the highest priority task.
>
> Per the SMP invariant, this new highest priority task must preempt the
> lowest priority task currently running, otherwise we would not be
> running the M highest prio tasks.
>
> That's not to say it might not still be beneficial from trying to avoid
> them, but we must assume a pilup will occur, therefore my focus was on
> dealing with them as best we can first.
>