Re: [PATCH v4 0/4] sched/fair: Capacity aware wakeup rework

From: Valentin Schneider
Date: Fri Feb 07 2020 - 07:41:52 EST


On 07/02/2020 10:42, Quentin Perret wrote:
>> Phantom domains (MC + DIE)
>> --------------------------
>>
>> This is mostly included for the sake of completeness.
>>
>> 100 iterations of 'sysbench --max-time=5 --max-requests=-1 --test=threads --num-threads=8 run':
>>
>> | | -PATCH | +PATCH | DELTA (%) |
>> |------+-------------+-------------+-----------|
>> | mean | 7317.940000 | 9328.470000 | +27.474 |
>> | std | 460.372682 | 181.528886 | -60.569 |
>> | min | 5888.000000 | 8832.000000 | +50.000 |
>> | 50% | 7271.000000 | 9348.000000 | +28.566 |
>> | 75% | 7497.500000 | 9477.250000 | +26.405 |
>> | 99% | 8464.390000 | 9634.160000 | +13.820 |
>> | max | 8602.000000 | 9650.000000 | +12.183 |
>
>
> So, it feels like the most interesting test would be
>
> 'baseline w/ phantom domains' vs 'this patch w/o phantom domains'
>
> right ? The 'baseline w/o phantom domains' case is arguably borked today,
> so it isn't that interesting (even though it performs well for the
> particular workload you choose here, as expected, but I guess you might
> see issues in others).
>
> So, IIUC, based on your results above, that would be:
>
> | | base+PD | patch+noPD | DELTA (%) |
> |------+-------------+-------------+-----------|
> | mean | 7317.940000 | 8731.610000 | +19.318 |
> | std | 460.372682 | 206.826912 | -55.074 |
> | min | 5888.000000 | 8251.000000 | +40.132 |
> | 50% | 7271.000000 | 8705.000000 | +19.722 |
> | 75% | 7497.500000 | 8868.250000 | +18.283 |
> | 99% | 8464.390000 | 9155.520000 | +8.165 |
> | max | 8602.000000 | 9207.000000 | +7.033 |
>
> Is that correct ?
>

That does look like it!

> If so, this patch series is still a very big win, and I'm all for
> getting it merged. But I find it interesting that the results aren't as
> good as having this patch _and_ phantom domains at the same time ...
>
> Any idea why having phantom domains helps ? select_idle_capacity()
> should behave the same w/ or w/o phantom domains given that you use
> sd_asym_cpucapacity directly.

My thoughts as well.

> I'm guessing something else has an impact
> here ? LB / misfit behaving a bit differently perhaps ?
>

Mm so phantom domains further restrict which CPUs can pull during LB (see
should_we_balance()). On a "flat" topology with just an MC domain, any
CPU is free to pull from any other CPU at any time. With phantom domains,
LB isn't restricted at MC (i.e. within a group of equal capacities), but
is at DIE, so we'll bail out more often there.

So even though we go through load balance more often with phantom domains
(here it would be every 4 jiffies for MC, 8 for DIE), we may end up bailing
out more often that without phantom domains. Then again, there barely is
more than one task per CPU in the sysbench case, so we don't have a very
expensive task scan.

Also, I don't think it plays a role here, but phantom domains are more
conservative in terms of up/downmigration.
Since you balance first at MC level, if you have an imbalance you will try
to solve it within CPUs of same capacity value. Without phantom domains,
you just have a single domain that spans everything, so you could move a
small task to a big CPU, despite there being spare LITTLEs.

Investing this some more is dangling somewhere on my todo-list.

> Thanks,
> Quentin
>