Re: [RFC v5 4/6] sched/fair: Tune task wake-up logic to pack small background tasks on fewer cores

From: Dietmar Eggemann
Date: Tue Oct 08 2019 - 12:52:41 EST


[- Quentin Perret <quentin.perret@xxxxxxx>]
[+ Quentin Perret <qperret@xxxxxxxxxxx>]

See commit c193a3ffc282 ("mailmap: Update email address for Quentin Perret")

On 07/10/2019 18:53, Parth Shah wrote:
>
>
> On 10/7/19 5:49 PM, Vincent Guittot wrote:
>> On Mon, 7 Oct 2019 at 10:31, Parth Shah <parth@xxxxxxxxxxxxx> wrote:
>>>
>>> The algorithm finds the first non idle core in the system and tries to
>>> place a task in the idle CPU in the chosen core. To maintain
>>> cache hotness, work of finding non idle core starts from the prev_cpu,
>>> which also reduces task ping-pong behaviour inside of the core.
>>>
>>> Define a new method to select_non_idle_core which keep tracks of the idle
>>> and non-idle CPUs in the core and based on the heuristics determines if the
>>> core is sufficiently busy to place the incoming backgroung task. The
>>> heuristic further defines the non-idle CPU into either busy (>12.5% util)
>>> CPU and overutilized (>80% util) CPU.
>>> - The core containing more idle CPUs and no busy CPUs is not selected for
>>> packing
>>> - The core if contains more than 1 overutilized CPUs are exempted from
>>> task packing
>>> - Pack if there is atleast one busy CPU and overutilized CPUs count is <2
>>>
>>> Value of 12.5% utilization for busy CPU gives sufficient heuristics for CPU
>>> doing enough work an

[...]

>>> @@ -6483,7 +6572,11 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
>>> } else if (sd_flag & SD_BALANCE_WAKE) { /* XXX always ? */
>>> /* Fast path */
>>>
>>> - new_cpu = select_idle_sibling(p, prev_cpu, new_cpu);
>>> + if (is_turbosched_enabled() && unlikely(is_background_task(p)))
>>> + new_cpu = turbosched_select_non_idle_core(p, prev_cpu,
>>> + new_cpu);
>>
>> Could you add turbosched_select_non_idle_core() similarly to
>> find_energy_efficient_cpu() ?
>> Add it at the beg select_task_rq_fair()
>> Return immediately with theCPU if you have found one
>> Or let the normal path select a CPU if the
>> turbosched_select_non_idle_core() has not been able to find a suitable
>> CPU for packing
>>
>
> of course. I can do that.
> I was just not aware about the effect of wake_affine and so was waiting for
> such comments to be sure of. Thanks for this.
> Maybe I can add just below the sched_energy_present(){...} construct giving
> precedence to EAS? I'm asking this because I remember Patrick telling me to
> leverage task packing for android as well?

I have a hard time imaging that Turbosched will be used in Android next
to EAS in the foreseeable future.

First of all, EAS provides task packing already on Performance Domain
(PD) level (a.k.a. as cluster on traditional 2-cluster Arm/Arm64
big.LITTLE or DynamIQ (with Phantom domains (out of tree solution)).
This is where we can safe energy without harming latency.

See the tests results under '2.1 Energy test case' in

https://lore.kernel.org/r/20181203095628.11858-1-quentin.perret@xxxxxxx

There are 10 to 50 small (classified solely by task utilization) tasks
per test case and EAS shows an effect on energy consumption by packing
them onto the PD (cluster) of the small CPUs.

And second, the CPU supported topology is different to the one you're
testing on.

[...]