Re: [RFC V2 2/2] sched/fair: Fallback to sched-idle CPU if idle CPU isn't found

From: Viresh Kumar
Date: Mon May 13 2019 - 05:35:55 EST

On 10-05-19, 09:21, Peter Zijlstra wrote:
> On Thu, Apr 25, 2019 at 03:07:40PM +0530, Viresh Kumar wrote:
> > We target for an idle CPU in select_idle_sibling() to run the next task,
> > but in case we don't find idle CPUs it is better to pick a CPU which
> > will run the task the soonest, for performance reason. A CPU which isn't
> > idle but has only SCHED_IDLE activity queued on it should be a good
> > target based on this criteria as any normal fair task will most likely
> > preempt the currently running SCHED_IDLE task immediately. In fact,
> > choosing a SCHED_IDLE CPU shall give better results as it should be able
> > to run the task sooner than an idle CPU (which requires to be woken up
> > from an idle state).
> >
> > This patch updates the fast path to fallback to a sched-idle CPU if the
> > idle CPU isn't found, the slow path can be updated separately later.
> >
> > Following is the order in which select_idle_sibling() picks up next CPU
> > to run the task now:
> >
> > 1. idle_cpu(target) OR sched_idle_cpu(target)
> > 2. idle_cpu(prev) OR sched_idle_cpu(prev)
> > 3. idle_cpu(recent_used_cpu) OR sched_idle_cpu(recent_used_cpu)
> > 4. idle core(sd)
> > 5. idle_cpu(sd)
> > 6. sched_idle_cpu(sd)
> > 7. idle_cpu(p) - smt
> > 8. sched_idle_cpu(p)- smt
> >
> > Though the policy can be tweaked a bit if we want to have different
> > priorities.
> I don't hate his per se; but the whole select_idle_sibling() thing is
> something that needs looking at.
> There was the task stealing thing from Steve that looked interesting and
> that would render your apporach unfeasible.

I am surely missing something as I don't see how that patchset will
make this patchset perform badly, than what it already does.

The idea of this patchset is to find a CPU which can run the task the
soonest if no other CPU is idle. If a CPU is idle we still want to run
the task on that one to finish work asap. This patchset only updates
the fast path right now and doesn't touch slow-path and periodic/idle
load-balance path. That would be the next step for sure though.

Steve's patchset (IIUC) adds a new fast way of doing idle-load-balance
at the LLC level, that is no different than normal idle-load-balancing
for this patchset. In fact, I will say that Steve's patchset makes our
work easier to extend going forward as we can capitalize on the new
*fast* infrastructure to pull tasks even when a CPU isn't fully idle
but only has sched-idle stuff on it.

Does this makes sense ?

@Song: Thanks for giving this a try and I am really happy to see your
results. I do see that we still don't get the performance we wanted,
perhaps because we only touch the fast path. Maybe load-balance screws
it up for us at a later point of time and CPUs are left with only
sched-idle tasks. Not sure though.