Re: [PATCH v2 6/6] cpufreq/cppc: set the frequency used for capacity computation
From: Vincent Guittot
Date: Wed Oct 11 2023 - 10:26:07 EST
On Wed, 11 Oct 2023 at 12:27, Pierre Gondois <pierre.gondois@xxxxxxx> wrote:
>
> Hello Vincent,
>
> On 10/9/23 12:36, Vincent Guittot wrote:
> > cppc cpufreq driver can register an artificial energy model. In such case,
> > it also have to register the frequency that is used to define the CPU
> > capacity
> >
> > Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> > ---
> > drivers/cpufreq/cppc_cpufreq.c | 18 ++++++++++++++++++
> > 1 file changed, 18 insertions(+)
> >
> > diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
> > index fe08ca419b3d..24c6ba349f01 100644
> > --- a/drivers/cpufreq/cppc_cpufreq.c
> > +++ b/drivers/cpufreq/cppc_cpufreq.c
> > @@ -636,6 +636,21 @@ static int populate_efficiency_class(void)
> > return 0;
> > }
> >
> > +
> > +static void cppc_cpufreq_set_capacity_ref_freq(struct cpufreq_policy *policy)
> > +{
> > + struct cppc_perf_caps *perf_caps;
> > + struct cppc_cpudata *cpu_data;
> > + unsigned int ref_freq;
> > +
> > + cpu_data = policy->driver_data;
> > + perf_caps = &cpu_data->perf_caps;
> > +
> > + ref_freq = cppc_cpufreq_perf_to_khz(cpu_data, perf_caps->highest_perf);
> > +
> > + per_cpu(capacity_ref_freq, policy->cpu) = ref_freq;
>
> 'capacity_ref_freq' seems to be updated only if CONFIG_ENERGY_MODEL is set. However in
> [1], get_capacity_ref_freq() relies on 'capacity_ref_freq'. The cpufreq_schedutil governor
> should have a valid 'capacity_ref_freq' value set if the CPPC cpufreq driver is used
> without energy model I believe.
we can disable it by setting capacity_ref_freq to 0 so it will
fallback on cpuinfo like intel and amd which uses default
SCHED_CAPACITY_SCALE capacity
Could you provide me with more details about your platform ? I still
try to understand how the cpu compute capacity is set up on your
system. How do you set per_cpu cpu_scale variable ? we should set the
ref freq at the same time
>
> Also 'capacity_ref_freq' seems to be set only for 'policy->cpu'. I believe it should
> be set for the whole perf domain in case this 'policy->cpu' goes offline.
>
> Another thing, related my comment to [1] and to [2], for CPPC the max capacity matches
> the boosting frequency. We have:
> 'non-boosted max capacity' < 'boosted max capacity'.
> -
> If boosting is not enabled, the CPU utilization can still go above the 'non-boosted max
> capacity'. The overutilization of the system seems to be triggered by comparing the CPU
> util to the 'boosted max capacity'. So systems might not be detected as overutilized.
As Peter mentioned, we have to decide what is the original compute
capacity of your CPUs which is usually the sustainable max compute
capacity, especially when using EAS and EM
>
> For the EAS energy computation, em_cpu_energy() tries to predict the frequency that will
> be used. It is currently unknown to the function that the frequency request will be
> clamped by __resolve_freq():
> get_next_freq()
> \-cpufreq_driver_resolve_freq()
> \-__resolve_freq()
> This means that the energy computation might use boosting frequencies, which are not
> available.
>
> Regards,
> Pierre
>
> [1]: [PATCH v2 4/6] cpufreq/schedutil: use a fixed reference frequency
> [2]: https://lore.kernel.org/lkml/20230905113308.GF28319@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
>
> > +}
> > +
> > static void cppc_cpufreq_register_em(struct cpufreq_policy *policy)
> > {
> > struct cppc_cpudata *cpu_data;
> > @@ -643,6 +658,9 @@ static void cppc_cpufreq_register_em(struct cpufreq_policy *policy)
> > EM_ADV_DATA_CB(cppc_get_cpu_power, cppc_get_cpu_cost);
> >
> > cpu_data = policy->driver_data;
> > +
> > + cppc_cpufreq_set_capacity_ref_freq(policy);
> > +
> > em_dev_register_perf_domain(get_cpu_device(policy->cpu),
> > get_perf_level_count(policy), &em_cb,
> > cpu_data->shared_cpu_map, 0);