Re: [RFC][PATCH v1 3/3] cpufreq: intel_pstate: Set asymmetric CPU capacity on hybrid systems

From: Dietmar Eggemann
Date: Tue May 21 2024 - 08:51:25 EST


On 06/05/2024 16:39, Rafael J. Wysocki wrote:
> On Thu, May 2, 2024 at 12:43 PM Dietmar Eggemann
> <dietmar.eggemann@xxxxxxx> wrote:
>>
>> On 25/04/2024 21:06, Rafael J. Wysocki wrote:
>>> From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>

[...]

>> So cpu_capacity has a direct mapping to itmt prio. cpu_capacity is itmt
>> prio with max itmt prio scaled to 1024.
>
> Right.
>
> The choice to make the ITMT prio reflect the capacity is deliberate,
> although this code works with values retrieved via CPPC (which are the
> same as the HWP_CAP values in the majority of cases but not always).
>
>> Running it on i7-13700K (while allowing SMT) gives:
>>
>> root@gulliver:~# dmesg | grep sched_set_itmt_core_prio
>> [ 3.957826] sched_set_itmt_core_prio() cpu=0 prio=68
>> [ 3.990401] sched_set_itmt_core_prio() cpu=1 prio=68
>> [ 4.015551] sched_set_itmt_core_prio() cpu=2 prio=68
>> [ 4.040720] sched_set_itmt_core_prio() cpu=3 prio=68
>> [ 4.065871] sched_set_itmt_core_prio() cpu=4 prio=68
>> [ 4.091018] sched_set_itmt_core_prio() cpu=5 prio=68
>> [ 4.116175] sched_set_itmt_core_prio() cpu=6 prio=68
>> [ 4.141374] sched_set_itmt_core_prio() cpu=7 prio=68
>> [ 4.166543] sched_set_itmt_core_prio() cpu=8 prio=69
>> [ 4.196289] sched_set_itmt_core_prio() cpu=9 prio=69
>> [ 4.214964] sched_set_itmt_core_prio() cpu=10 prio=69
>> [ 4.239281] sched_set_itmt_core_prio() cpu=11 prio=69
>
> CPUs 8 - 10 appear to be "favored cores" that can turbo up higher than
> the other P-cores.
>
>> [ 4.263438] sched_set_itmt_core_prio() cpu=12 prio=68
>> [ 4.283790] sched_set_itmt_core_prio() cpu=13 prio=68
>> [ 4.308905] sched_set_itmt_core_prio() cpu=14 prio=68
>> [ 4.331751] sched_set_itmt_core_prio() cpu=15 prio=68
>> [ 4.356002] sched_set_itmt_core_prio() cpu=16 prio=42
>> [ 4.381639] sched_set_itmt_core_prio() cpu=17 prio=42
>> [ 4.395175] sched_set_itmt_core_prio() cpu=18 prio=42
>> [ 4.425625] sched_set_itmt_core_prio() cpu=19 prio=42
>> [ 4.449670] sched_set_itmt_core_prio() cpu=20 prio=42
>> [ 4.479681] sched_set_itmt_core_prio() cpu=21 prio=42
>> [ 4.506319] sched_set_itmt_core_prio() cpu=22 prio=42
>> [ 4.523774] sched_set_itmt_core_prio() cpu=23 prio=42

I wonder what the relation between this CPU capacity value based on
HWP_CAP is to the per-IPC class performance values of the 'HFI
performance and efficiency score' table is.

Running '[PATCH v3 00/24] sched: Introduce classes of tasks for load
balance' on i7-13700K w/ 'nosmt' I get:

Score
CPUs Class 0 1 2 3
SSE AVX2 VNNI PAUSE

0 2,4,6, 12, 14 68 80 106 53
8, 10 69 81 108 54
16-23 42 42 42 42

Looks like the HWP_CAP values are in sync with the scores of IPP Class
0. I was expecting that the HWP_CAP values reflect more an average over
all classes? Or maybe I misinterpret this relation?

[...]

>>> If the driver's "no_trubo" sysfs attribute is updated, all of the CPU
>>> capacity information is computed from scratch to reflect the new turbo
>>> status.
>>
>> So if I do:
>>
>> echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
>>
>> I get:
>>
>> [ 1692.801368] hybrid_update_cpu_scaling() called
>> [ 1692.801381] hybrid_update_cpu_scaling() max_cap_perf=44, max_perf_cpu=0
>> [ 1692.801389] hybrid_set_cpu_capacity() cpu=1 cap=1024
>> [ 1692.801395] hybrid_set_cpu_capacity() cpu=2 cap=1024
>> [ 1692.801399] hybrid_set_cpu_capacity() cpu=3 cap=1024
>> [ 1692.801402] hybrid_set_cpu_capacity() cpu=4 cap=1024
>> [ 1692.801405] hybrid_set_cpu_capacity() cpu=5 cap=1024
>> [ 1692.801408] hybrid_set_cpu_capacity() cpu=6 cap=1024
>> [ 1692.801410] hybrid_set_cpu_capacity() cpu=7 cap=1024
>> [ 1692.801413] hybrid_set_cpu_capacity() cpu=8 cap=1024
>> [ 1692.801416] hybrid_set_cpu_capacity() cpu=9 cap=1024
>> [ 1692.801419] hybrid_set_cpu_capacity() cpu=10 cap=1024
>> [ 1692.801422] hybrid_set_cpu_capacity() cpu=11 cap=1024
>> [ 1692.801425] hybrid_set_cpu_capacity() cpu=12 cap=1024
>> [ 1692.801428] hybrid_set_cpu_capacity() cpu=13 cap=1024
>> [ 1692.801431] hybrid_set_cpu_capacity() cpu=14 cap=1024
>> [ 1692.801433] hybrid_set_cpu_capacity() cpu=15 cap=1024
>> [ 1692.801436] hybrid_set_cpu_capacity() cpu=16 cap=605
>> [ 1692.801439] hybrid_set_cpu_capacity() cpu=17 cap=605
>> [ 1692.801442] hybrid_set_cpu_capacity() cpu=18 cap=605
>> [ 1692.801445] hybrid_set_cpu_capacity() cpu=19 cap=605
>> [ 1692.801448] hybrid_set_cpu_capacity() cpu=20 cap=605
>> [ 1692.801451] hybrid_set_cpu_capacity() cpu=21 cap=605
>> [ 1692.801453] hybrid_set_cpu_capacity() cpu=22 cap=605
>> [ 1692.801456] hybrid_set_cpu_capacity() cpu=23 cap=605
>>
>> Turbo on this machine stands only for the cpu_capacity diff 1009 vs 1024?
>
> Not really.
>
> The capacity of the fastest CPU is always 1024 and the capacities of
> all of the other CPUs are adjusted to that.
>
> When turbo is disabled, the capacity of the "favored cores" is the
> same as for the other P-cores (i.e. 1024) and the capacity of E-cores
> is relative to that.
>
> Of course, this means that task placement may be somewhat messed up
> after disabling or enabling turbo (which is a global switch), but I
> don't think that there is a way to avoid it.

I assume that this is OK. In task placement we don't deal with a system
of perfectly aligned values (including their sums) anyway.
And we recreate the sched domains (including updating the capacity sums
on sched groups) after this so the so load balance (smp nice etc) should
be fine too.