Re: [RFC V2 2/2] sched/fair: Fallback to sched-idle CPU if idle CPU isn't found

From: Steven Sistare
Date: Tue May 14 2019 - 12:05:34 EST


On 5/13/2019 7:35 AM, Peter Zijlstra wrote:
> On Mon, May 13, 2019 at 03:04:18PM +0530, Viresh Kumar wrote:
>> On 10-05-19, 09:21, Peter Zijlstra wrote:
>
>>> I don't hate his per se; but the whole select_idle_sibling() thing is
>>> something that needs looking at.
>>>
>>> There was the task stealing thing from Steve that looked interesting and
>>> that would render your apporach unfeasible.
>>
>> I am surely missing something as I don't see how that patchset will
>> make this patchset perform badly, than what it already does.
>
> Nah; I just misremembered. I know Oracle has a patch set poking at
> select_idle_siblings() _somewhere_ (as do I), and I just found the wrong
> one.
>
> Basically everybody is complaining select_idle_sibling() is too
> expensive for checking the entire LLC domain, except for FB (and thus
> likely some other workloads too) that depend on it to kill their tail
> latency.
>
> But I suppose we could still do this, even if we scan only a subset of
> the LLC, just keep track of the last !idle CPU running only SCHED_IDLE
> tasks and pick that if you do not (in your limited scan) find a better
> candidate.

Subhra posted a patch that incrementally searches for an idle CPU in the LLC,
remembering the last CPU examined, and searching a fixed number of CPUs from there.
That technique is compatible with the one that Viresh suggests; the incremental
search would stop if a SCHED_IDLE cpu was found.

I also fiddled with select_idle_sibling, maintaining a per-LLC bitmap of idle CPUs,
updated with atomic operations. Performance was basically unchanged for the
workloads I tested, and I inserted timers around the idle search showing it was
a very small fraction of time both before and after my changes. That led me to
ignore the push side and optimize the pull side with task stealing.

I would be very interested in hearing from folks that have workloads that demonstrate
that select_idle_sibling is too expensive.

- Steve