Re: [PATCH v6 0/4] Add support for AArch64 AMUv1-based arch_freq_get_on_cpu

From: Jie Zhan
Date: Wed Aug 14 2024 - 04:05:59 EST



On 11/07/2024 21:59, Beata Michalska wrote:
Hi Catalin,

On Mon, Jul 08, 2024 at 06:10:02PM +0100, Catalin Marinas wrote:
Hi Beata,

On Mon, Jun 03, 2024 at 09:21:50AM +0100, Beata Michalska wrote:
Introducing arm64 specific version of arch_freq_get_on_cpu, cashing on
existing implementation for FIE and AMUv1 support: the frequency scale
factor, updated on each sched tick, serves as a base for retrieving
the frequency for a given CPU, representing an average frequency
reported between the ticks - thus its accuracy is limited.

The changes have been rather lightly (due to some limitations) tested on
an FVP model. Note that some small discrepancies have been observed while
testing (on the model) and this is currently being investigated, though it
should not have any significant impact on the overall results.
What's the plan with this series? Are you still investigating those
discrepancies or is it good to go?

Overall it should be good to go with small caveat:
as per discussion [1] we might need to provide new sysfs attribute exposing an
average frequency instead of plugging new code under existing cpuinfo_cur_freq.
This is to avoid messing up with other archs and make a clean distinction on
which attribute provides what information.
As such, the arch_freq_get_on_cpu implementation provided within this series
[PATCH v6 3/4] will most probably be shifted to a new function.

Hopefully will be able to send those changes soon.

---
[1] https://lore.kernel.org/all/ZmrB_DqtmVpvG30l@xxxxxxx/
---
BR
Beata

--
Catalin

Hi Beata,

I've recently tested this patchset on a Kunpeng system.
It works as expected when reading scaling_cur_freq.
The frequency samples are much stabler than what cppc_cpufreq returns.

A few minor things.

1. I observed larger errors on idle cpus than busy cpus, though it's just up to 1%.
Not sure if this comes from the uncertain time interval between the last tick and entering idle.
The shorter averaging interval, the larger error, I supposed.

2. In the current implementation, the resolution of frequency would be:
max_freq_khz / SCHED_CAPACITY_SCALE
This looks a bit unnecessary to me.

It's supposed to get a better resolution if we can do this in arch_freq_get_on_cpu():

freq = delta_cycle_cnt * max_freq_khz / delta_const_cnt

which may require caching both current and previous sets of counts in the per-cpu struct amu_cntr_sample.

Kind regards,
Jie