Re: [PATCH v6 0/4] Add support for AArch64 AMUv1-based arch_freq_get_on_cpu

From: Jie Zhan
Date: Tue Aug 27 2024 - 09:04:07 EST




On 26/08/2024 15:24, Beata Michalska wrote:
...
Hi Beata,
Hi Jie,
I've recently tested this patchset on a Kunpeng system.
It works as expected when reading scaling_cur_freq.
The frequency samples are much stabler than what cppc_cpufreq returns.
Thank you for giving it a spin.
(and apologies for late reply)
A few minor things.

1. I observed larger errors on idle cpus than busy cpus, though it's just up
to 1%.
Not sure if this comes from the uncertain time interval between the last
tick and entering idle.
The shorter averaging interval, the larger error, I supposed.
All right - will look into it.
Just for my benefit: that diff is strictly between arch_freq_avg_get_on_cpu
and cpufreq_driver->get(policy->cpu) ?
I can't say whether it's "strictly" between them or not because driver->get()
shows a fluctuating value.
On my platform, cppc_cpufreq's driver->get() sometimes shows large errors on
busy cpus (as reported by recent emails), but quite accurate on idle cpus (<1%).

With this patch, the "error" on idle cpus as mentioned above is typically <1%,
hence better in general.
2. In the current implementation, the resolution of frequency would be:
max_freq_khz / SCHED_CAPACITY_SCALE
This looks a bit unnecessary to me.

It's supposed to get a better resolution if we can do this in
arch_freq_get_on_cpu():

freq = delta_cycle_cnt * max_freq_khz / delta_const_cnt

which may require caching both current and previous sets of counts in the
per-cpu struct amu_cntr_sample.

arch_freq_get_on_cpu relies on the frequency scale factor to derive the average
frequency. The scale factor is being calculated based on the deltas you have
mentioned and arch_max_freq_scale which uses SCHED_CAPACITY_SCALE*2 factor to
accommodate for rather low reference frequencies. arch_freq_get_on_cpu just does
somewhat reverse computation to that.
Yeah I understood this.

arch_freq_get_on_cpu() now returns:
freq = (arch_scale_freq_capacity() * arch_scale_freq_ref()) >> SCHED_CAPACITY_SHIFT

The frequency resolution is (arch_scale_freq_ref() >> SCHED_CAPACITY_SHIFT), which
is equivalent to max_freq_khz / SCHED_CAPACITY_SCALE.

If we can directly do
freq = delta_cycle_cnt * ref_freq_khz / delta_const_cnt
in arch_freq_get_on_cpu(), it's supposed to have a 1KHz resolution.
(sorry for the wrong multiplier in the last reply)

Just similar to what's done in [1].

It could be more worthwhile to have a good resolution than utilising the
arch_topology framework?

[1] https://lore.kernel.org/all/20240229162520.970986-2-vanshikonda@xxxxxxxxxxxxxxxxxxxxxx/
---
BR
Beata
Kind regards,
Jie