Re: [PATCH v5 2/2] sched/fair: update scale invariance of PELT
From: Quentin Perret
Date: Thu Nov 08 2018 - 06:35:44 EST
On Wednesday 07 Nov 2018 at 11:47:09 (+0100), Dietmar Eggemann wrote:
> The important bit for EAS is that it only uses utilization in the
> non-overutilized case. Here, utilization signals should look the same
> between the two approaches, not considering tasks with long periods like the
> 39/80ms example above.
> There are also some advantages for EAS with time scaling: (1) faster
> overutilization detection when a big task runs on a little CPU, (2) higher
> (initial) task utilization value when this task migrates from little to big
> CPU.
Agreed, these patches should help detecting the over-utilized scenarios
faster and more reliably, which is probably a good thing. I'll try to
have a look in more details soon.
> We should run our EAS task placement tests with your time scaling patches.
Right, I tried these patches with the synthetic tests we usually run
against our upstream EAS dev branch (see [1]), and I couldn't see any
regression, which is good sign :-)
<slightly off topic>
Since most people are probably not familiar with these tests, I'll try
to elaborate a little bit more. They are unit tests aimed to stress
particular behaviours of the scheduler on asymmetric platforms. More
precisely, they check that capacity-awareness/misfit and EAS are
actually able to up-migrate and down-migrate tasks between big and
little CPUs when necessary.
The tests are based on rt-app and ftrace. They basically run a whole lot
of scenarios with rt-app (small tasks, big tasks, a mix of both, tasks
changing behaviour, ramping up, ramping down, ...), pull a trace of the
execution and check that:
1. the task(s) did not miss activations (which will basically be true
only if the scheduler managed to provide each task with enough CPU
capacity). We call that one 'test_slack';
2. the task placement is close enough to the optimal placement
energy-wise (which is computed off-line using the energy model
and the rt-app conf). We call that one 'test_task_placement'.
For example, in order to pass the test, a periodic task that ramps up
from 10% to 70% over (say) 5s should probably start its execution on
little CPUs to not waste energy, and get up-migrated to big CPUs later
on to not miss activations. Otherwise one of the two checks will fail.
I'd like to emphasize that these test scenarios are *not* supposed to
look like real workloads at all. They've be design with the sole purpose
of stressing specific code paths of the scheduler to spot any obvious
breakage. They've proven quite useful for us in the past.
All the tests are publicly available in the LISA repo [2].
</slightly off topic>
So, to come back to Vincent's patches, I managed to get 10/10 pass rate
to most of the tests referred to as 'generic' in [1] on my Juno r0. The
kernel I tested had Morten's misfit patches, the EAS patches v8, and
Vincent's patches on top.
Although I still need to really get my head around all the implications
of changing PELT like that, I cannot see any obvious red flags from the
testing perspective here.
Thanks,
Quentin
---
[1] https://developer.arm.com/open-source/energy-aware-scheduling/eas-mainline-development
[2] https://github.com/ARM-software/lisa