Hi,Right, arch_freq_get_on_cpu() will not return 0 for idle CPUs.
On Friday 05 Jan 2024 at 15:04:47 (+0800), lihuisong (C) wrote:
Hi Vanshi,With the implementation at [1], arch_freq_get_on_cpu() will not return 0
在 2024/1/5 8:48, Vanshidhar Konda 写道:
On Thu, Jan 04, 2024 at 05:36:51PM +0800, lihuisong (C) wrote:From the approach in [1], if all CPUs (one or more cores) under one policy
在 2024/1/4 1:53, Ionela Voinescu 写道:I think the changes in [1] would work better when CPUs may be idle. With
Hi,It would work for me AFAICS.
On Tuesday 12 Dec 2023 at 15:26:17 (+0800), Huisong Li wrote:
Many developers found that the cpu current frequency is greater thanWould this [1] alternative solution work for you?
the maximum frequency of the platform, please see [1], [2] and [3].
In the scenarios with high memory access pressure, the patch [1] has
proved the significant latency of cpc_read() which is used to obtain
delivered and reference performance counter cause an absurd frequency.
The sampling interval for this counters is very critical and
is expected
to be equal. However, the different latency of cpc_read() has a direct
impact on their sampling interval.
Because the "arch_freq_scale" is also from AMU core and constant
counter, and read together.
But, from their discuss line, it seems that there are some tricky
points to clarify or consider.
this
patch we would have to wake any core that is in idle state to read the
AMU
counters. Worst case, if core 0 is trying to read the CPU frequency of
all
cores, it may need to wake up all the other cores to read the AMU
counters.
are idle, they still cannot be obtained the CPU frequency, right?
In this case, the [1] API will return 0 and have to back to call
cpufreq_driver->get() for cpuinfo_cur_freq.
Then we still need to face the issue this patch mentioned.
for idle CPUs and the get() callback will not be called to wake up the
CPUs.
But this frequency is from the last tick,
Worst case, arch_freq_get_on_cpu() will return a frequency based on the
AMU counter values obtained on the last tick on that CPU. But if that CPU
is not a housekeeping CPU, a housekeeping CPU in the same policy will be
selected, as it would have had a more recent tick, and therefore a more
recent frequency value for the domain.
It depends on the housekeeping CPUs.
I understand that the frequency returned here will not be up to date,
but there's no proper frequency feedback for an idle CPU. If one only
wakes up a CPU to sample counters, before the CPU goes back to sleep,
the obtained frequency feedback is meaningless.
What are the conditions you are referring to?For systems with 128 cores or more, this could be very expensive andBut the CPU frequency is just an average value for the last tick period
happen
very frequently.
AFAICS, the approach in [1] would avoid this cost.
instead of the current one the CPU actually runs at.
In addition, there are some conditions to use 'arch_freq_scale' in this
approach.
An average value for CPU frequency is ok. It may be better if it has not any delaying.
So I'm not sure if this approach can entirely cover the frequencyUnfortunately there is no perfect frequency feedback. By the time you
discrepancy issue.
observe/use the value of scaling_cur_freq/cpuinfo_cur_freq, the frequency
of the CPU might have already changed. Therefore, an average value might
be a better indication of the recent performance level of a CPU.
I has tested it on my platform (CPU number: 64, SMT: off and CPU base frequency: 2.7GHz).
Would you be able to test [1] on your platform and usecase?
[snip]
Many thanks,
Ionela.
/Huisong
[1] https://lore.kernel.org/lkml/20231127160838.1403404-1-beata.michalska@xxxxxxx/
Thanks,
Ionela.
This patch adds a interface, cpc_read_arch_counters_on_cpu, to read
delivered and reference performance counter together. According to my
test[4], the discrepancy of cpu current frequency in the
scenarios with
high memory access pressure is lower than 0.2% by stress-ng
application.
[1] https://lore.kernel.org/all/20231025093847.3740104-4-zengheng4@xxxxxxxxxx/
[2] https://lore.kernel.org/all/20230328193846.8757-1-yang@xxxxxxxxxxxxxxxxxxxxxx/
[3]
https://lore.kernel.org/all/20230418113459.12860-7-sumitg@xxxxxxxxxx/
[4] My local test:
The testing platform enable SMT and include 128 logical CPU in total,
and CPU base frequency is 2.7GHz. Reading "cpuinfo_cur_freq" for each
physical core on platform during the high memory access pressure from
stress-ng, and the output is as follows:
0: 2699133 2: 2699942 4: 2698189 6: 2704347
8: 2704009 10: 2696277 12: 2702016 14: 2701388
16: 2700358 18: 2696741 20: 2700091 22: 2700122
24: 2701713 26: 2702025 28: 2699816 30: 2700121
32: 2700000 34: 2699788 36: 2698884 38: 2699109
40: 2704494 42: 2698350 44: 2699997 46: 2701023
48: 2703448 50: 2699501 52: 2700000 54: 2699999
56: 2702645 58: 2696923 60: 2697718 62: 2700547
64: 2700313 66: 2700000 68: 2699904 70: 2699259
72: 2699511 74: 2700644 76: 2702201 78: 2700000
80: 2700776 82: 2700364 84: 2702674 86: 2700255
88: 2699886 90: 2700359 92: 2699662 94: 2696188
96: 2705454 98: 2699260 100: 2701097 102: 2699630
104: 2700463 106: 2698408 108: 2697766 110: 2701181
112: 2699166 114: 2701804 116: 2701907 118: 2701973
120: 2699584 122: 2700474 124: 2700768 126: 2701963
Signed-off-by: Huisong Li <lihuisong@xxxxxxxxxx>
---
.