Re: [PATCH v5 2/2] sched/fair: update scale invariance of PELT

From: Vincent Guittot
Date: Thu Nov 08 2018 - 11:05:01 EST


On Thu, 8 Nov 2018 at 12:35, Quentin Perret <quentin.perret@xxxxxxx> wrote:
>
> On Wednesday 07 Nov 2018 at 11:47:09 (+0100), Dietmar Eggemann wrote:
> > The important bit for EAS is that it only uses utilization in the
> > non-overutilized case. Here, utilization signals should look the same
> > between the two approaches, not considering tasks with long periods like the
> > 39/80ms example above.
> > There are also some advantages for EAS with time scaling: (1) faster
> > overutilization detection when a big task runs on a little CPU, (2) higher
> > (initial) task utilization value when this task migrates from little to big
> > CPU.
>
> Agreed, these patches should help detecting the over-utilized scenarios
> faster and more reliably, which is probably a good thing. I'll try to
> have a look in more details soon.
>
> > We should run our EAS task placement tests with your time scaling patches.
>
> Right, I tried these patches with the synthetic tests we usually run
> against our upstream EAS dev branch (see [1]), and I couldn't see any
> regression, which is good sign :-)

Thanks for testing

>
>
> <slightly off topic>
> Since most people are probably not familiar with these tests, I'll try
> to elaborate a little bit more. They are unit tests aimed to stress
> particular behaviours of the scheduler on asymmetric platforms. More
> precisely, they check that capacity-awareness/misfit and EAS are
> actually able to up-migrate and down-migrate tasks between big and
> little CPUs when necessary.
>
> The tests are based on rt-app and ftrace. They basically run a whole lot
> of scenarios with rt-app (small tasks, big tasks, a mix of both, tasks
> changing behaviour, ramping up, ramping down, ...), pull a trace of the
> execution and check that:
>
> 1. the task(s) did not miss activations (which will basically be true
> only if the scheduler managed to provide each task with enough CPU
> capacity). We call that one 'test_slack';
>
> 2. the task placement is close enough to the optimal placement
> energy-wise (which is computed off-line using the energy model
> and the rt-app conf). We call that one 'test_task_placement'.
>
> For example, in order to pass the test, a periodic task that ramps up
> from 10% to 70% over (say) 5s should probably start its execution on
> little CPUs to not waste energy, and get up-migrated to big CPUs later
> on to not miss activations. Otherwise one of the two checks will fail.
>
> I'd like to emphasize that these test scenarios are *not* supposed to
> look like real workloads at all. They've be design with the sole purpose
> of stressing specific code paths of the scheduler to spot any obvious
> breakage. They've proven quite useful for us in the past.
>
> All the tests are publicly available in the LISA repo [2].
> </slightly off topic>
>
>
> So, to come back to Vincent's patches, I managed to get 10/10 pass rate
> to most of the tests referred to as 'generic' in [1] on my Juno r0. The
> kernel I tested had Morten's misfit patches, the EAS patches v8, and
> Vincent's patches on top.
>
> Although I still need to really get my head around all the implications
> of changing PELT like that, I cannot see any obvious red flags from the
> testing perspective here.
>
> Thanks,
> Quentin
>
> ---
> [1] https://developer.arm.com/open-source/energy-aware-scheduling/eas-mainline-development
> [2] https://github.com/ARM-software/lisa