Re: [RFC PATCH 0/3] newidle_balance() latency mitigation
From: Valentin Schneider
Date: Thu Apr 30 2020 - 06:14:13 EST
On 30/04/20 08:44, Vincent Guittot wrote:
> On Thu, 30 Apr 2020 at 01:13, Valentin Schneider
> <valentin.schneider@xxxxxxx> wrote:
>>
>>
>> On 28/04/20 06:02, Scott Wood wrote:
>> > These patches mitigate latency caused by newidle_balance() on large
>> > systems, by enabling interrupts when the lock is dropped, and exiting
>> > early at various points if an RT task is runnable on the current CPU.
>> >
>> > When applied to an RT kernel on a 72-core machine (2 threads per core), I
>> > saw significant reductions in latency as reported by rteval -- from
>> > over 500us to around 160us with hyperthreading disabled, and from
>> > over 1400us to around 380us with hyperthreading enabled.
>> >
>> > This isn't the first time something like this has been tried:
>> > https://lore.kernel.org/lkml/20121222003019.433916240@xxxxxxxxxxx/
>> > That attempt ended up being reverted:
>> > https://lore.kernel.org/lkml/5122CD9C.9070702@xxxxxxxxxx/
>> >
>> > The problem in that case was the failure to keep BH disabled, and the
>> > difficulty of fixing that when called from the post_schedule() hook.
>> > This patchset uses finish_task_switch() to call newidle_balance(), which
>> > enters in non-atomic context so we have full control over what we disable
>> > and when.
>> >
>> > There was a note at the end about wanting further discussion on the matter --
>> > does anyone remember if that ever happened and what the conclusion was?
>> > Are there any other issues with enabling interrupts here and/or moving
>> > the newidle_balance() call?
>> >
>>
>> Random thought that just occurred to me; in the grand scheme of things,
>> with something in the same spirit as task-stealing (i.e. don't bother with
>> a full fledged balance at newidle, just pick one spare task somewhere),
>> none of this would be required.
>
> newly idle load balance already stops after picking 1 task
Mph, I had already forgotten your changes there. Is that really always the
case for newidle? In e.g. the busiest->group_type == group_fully_busy case,
I think we can pull more than one task.
> Now if your proposal is to pick one random task on one random cpu, I'm
> clearly not sure that's a good idea
>
IIRC Steve's implementation was to "simply" pull one task from any CPU
within the LLC domain that had > 1 runnable tasks. I quite like this since
picking any one task is almost always better than switching to the idle
task, but it wasn't a complete newidle_balance() replacement just yet.
>
>>
>> Sadly I don't think anyone has been looking at it any recently.