Re: [PATCH] cpufreq/cppc: changing highest_perf to nominal_perf in cppc_cpufreq_cpu_init()

From: Viresh Kumar
Date: Thu Jun 06 2024 - 03:21:20 EST


On 05-06-24, 15:26, Ionela Voinescu wrote:
> > > > > cpufreq_start_governor() // governor: performance
> > > > > new_freq = cpufreq_driver->get() // if new_freq == policy->max
> > > > > if (policy->cur != new_freq)
> > > > > cpufreq_out_of_sync(policy, new_freq)
> > > > > ...
> > > > > policy->cur = new_freq
> > > I believe the problem is here ^^^^^^^^^^^^^^^^^^^^^^.
> > >
> > > cpufreq_verify_current_freq() should not update policy->cur unless a
> > > request to change frequency has actually reached the driver. I believe
> > > policy->cur should always reflect the request, not the actual current
> > > frequency of the CPU.

There are times when the core doesn't have any prior information about
the frequency, for example at driver probe time and resume. And so
needs to set policy->cur by reading it from the hardware.

> > > Given that new_freq is the current (hardware) frequency of the CPU,
> > > obtained via .get(), it can be the nominal frequency, as it is in your
> > > case, or any frequency, if there is any firmware/hardware capping in
> > > place.
> > >
> > > This causes the issue in your scenario, in which __cpufreq_driver_target()
> > > filters the request from the governor as it finds it equal to policy->cur,
> > > and it believes it's already set by hardware.

I am still not sure why mismatch happens at boot time here.

> > > This causes another issue in which scaling_cur_freq, which for some
> > > systems returns policy->cur, ends up returning the hardware frequency of
> > > the CPUs, and not the last frequency request, as it should:
> > >
> > > "scaling_cur_freq
> > > Current frequency of all of the CPUs belonging to this policy (in kHz).
> > >
> > > In the majority of cases, this is the frequency of the last P-state
> > > requested by the scaling driver from the hardware using the scaling
> > > interface provided by it, which may or may not reflect the frequency
> > > the CPU is actually running at (due to hardware design and other
> > > limitations)." [1]

There is discussion going on about this in another thread [1] now.

> > > Therefore policy->cur gets polluted with the hardware frequency of the
> > > CPU sampled at that one time, and this affects governor decisions, as
> > > in your case, and scaling_cur_freq feedback as well. This bad value will
> > > not change until there's another .target() or cpufreq_out_of_sync()
> > > call, which will never happen for fixed frequency governors like the
> > > performance governor.

--
viresh

[1] https://lore.kernel.org/all/20240603081331.3829278-2-beata.michalska@xxxxxxx/