Re: [RFC PATCH 0/3] newidle_balance() latency mitigation

From: Valentin Schneider
Date: Wed Apr 29 2020 - 19:13:25 EST



On 28/04/20 06:02, Scott Wood wrote:
> These patches mitigate latency caused by newidle_balance() on large
> systems, by enabling interrupts when the lock is dropped, and exiting
> early at various points if an RT task is runnable on the current CPU.
>
> When applied to an RT kernel on a 72-core machine (2 threads per core), I
> saw significant reductions in latency as reported by rteval -- from
> over 500us to around 160us with hyperthreading disabled, and from
> over 1400us to around 380us with hyperthreading enabled.
>
> This isn't the first time something like this has been tried:
> https://lore.kernel.org/lkml/20121222003019.433916240@xxxxxxxxxxx/
> That attempt ended up being reverted:
> https://lore.kernel.org/lkml/5122CD9C.9070702@xxxxxxxxxx/
>
> The problem in that case was the failure to keep BH disabled, and the
> difficulty of fixing that when called from the post_schedule() hook.
> This patchset uses finish_task_switch() to call newidle_balance(), which
> enters in non-atomic context so we have full control over what we disable
> and when.
>
> There was a note at the end about wanting further discussion on the matter --
> does anyone remember if that ever happened and what the conclusion was?
> Are there any other issues with enabling interrupts here and/or moving
> the newidle_balance() call?
>

Random thought that just occurred to me; in the grand scheme of things,
with something in the same spirit as task-stealing (i.e. don't bother with
a full fledged balance at newidle, just pick one spare task somewhere),
none of this would be required.

Sadly I don't think anyone has been looking at it any recently.