Re: [PATCH] sched: support dynamiQ cluster

From: Vincent Guittot
Date: Tue Apr 03 2018 - 08:18:15 EST


Hi Valentin,

On 3 April 2018 at 00:27, Valentin Schneider <valentin.schneider@xxxxxxx> wrote:
> Hi,
>
> On 30/03/18 13:34, Vincent Guittot wrote:
>> Hi Morten,
>>
> [..]
>>>
>>> As I see it, the main differences is that ASYM_PACKING attempts to pack
>>> all tasks regardless of task utilization on the higher capacity cpus
>>> whereas the "misfit task" series carefully picks cpus with tasks they
>>> can't handle so we don't risk migrating tasks which are perfectly
>>
>> That's one main difference because misfit task will let middle range
>> load task on little CPUs which will not provide maximum performance.
>> I have put an example below
>>
>>> suitable to for a little cpu to a big cpu unnecessarily. Also it is
>>> based directly on utilization and cpu capacity like the capacity
>>> awareness we already have to deal with big.LITTLE in the wake-up path.
>
> I think that bit is quite important. AFAICT, ASYM_PACKING disregards
> task utilization, it only makes sure that (with your patch) tasks will be
> migrated to big CPUS if those ever go idle (pulls at NEWLY_IDLE balance or
> later on during nohz balance). I didn't see anything related to ASYM_PACKING
> in the wake path.
>
>>> Have to tried taking the misfit patches for a spin on your setup? I
>>> expect them give you the same behaviour as you report above.
>>
>> So I have tried both your tests and mine on both patchset and they
>> provide same results which is somewhat expected as the benches are run
>> for several seconds.
>> In other to highlight the main difference between misfit task and
>> ASYM_PACKING, I have reused your test and reduced the number of
>> max-request for sysbench so that the test duration was in the range of
>> hundreds ms.
>>
>> Hikey960 (emulate dynamiq topology)
>> min avg(stdev) max
>> misfit 0.097500 0.114911(+- 10%) 0.138500
>> asym 0.092500 0.106072(+- 6%) 0.122900
>>
>> In this case, we can see that ASYM_PACKING is doing better( 8%)
>> because it migrates sysbench threads on big core as soon as they are
>> available whereas misfit task has to wait for the utilization to
>> increase above the 80% which takes around 70ms when starting with an
>> utilization that is null
>>
>
> I believe ASYM_PACKING behaves better here because the workload is only
> sysbench threads. As stated above, since task utilization is disregarded, I

It behaves better because it doesn't wait for the task's utilization
to reach a level before assuming the task needs high compute capacity.
The utilization gives an idea of the running time of the task not the
performance level that is needed

> think we could have a scenario where the big CPUs are filled with "small"
> tasks and the LITTLE CPUs hold a few "big" tasks - because what mostly
> matters here is the order in which the tasks spawn, not their utilization -
> which is potentially broken.
>
> There's that bit in *update_sd_pick_busiest()*:
>
> /* No ASYM_PACKING if target CPU is already busy */
> if (env->idle == CPU_NOT_IDLE)
> return true;
>
> So I'm not entirely sure how realistic that scenario is, but I suppose it
> could still happen. Food for thought in any case.
>
> Regards,
> Valentin