Re: [PATCH] Revert "clk: Fix invalid execution of clk_set_rate"

From: Stephen Boyd
Date: Tue Dec 03 2024 - 16:02:52 EST


Quoting Manivannan Sadhasivam (2024-12-03 01:21:51)
> On Tue, Dec 03, 2024 at 09:25:01AM +0100, Johan Hovold wrote:
> > [ +CC: Viresh and Sudeep ]
> >
> > On Mon, Dec 02, 2024 at 05:20:06PM -0800, Stephen Boyd wrote:
> > > Quoting Johan Hovold (2024-12-02 02:06:21)
> > > > This reverts commit 25f1c96a0e841013647d788d4598e364e5c2ebb7.
> > > >
> > > > The offending commit results in errors like
> > > >
> > > > cpu cpu0: _opp_config_clk_single: failed to set clock rate: -22
> > > >
> > > > spamming the logs on the Lenovo ThinkPad X13s and other Qualcomm
> > > > machines when cpufreq tries to update the CPUFreq HW Engine clocks.
> > > >
> > > > As mentioned in commit 4370232c727b ("cpufreq: qcom-hw: Add CPU clock
> > > > provider support"):
> > > >
> > > > [T]he frequency supplied by the driver is the actual frequency
> > > > that comes out of the EPSS/OSM block after the DCVS operation.
> > > > This frequency is not same as what the CPUFreq framework has set
> > > > but it is the one that gets supplied to the CPUs after
> > > > throttling by LMh.
> > > >
> > > > which seems to suggest that the driver relies on the previous behaviour
> > > > of clk_set_rate().
> > >
> > > I don't understand why a clk provider is needed there. Is anyone looking
> > > into the real problem?
> >
> > I mentioned this to Mani yesterday, but I'm not sure if he has had time
> > to look into it yet. And I forgot to CC Viresh who was involved in
> > implementing this. There is comment of his in the thread where this
> > feature was added:
> >
> > Most likely no one will ever do clk_set_rate() on this new
> > clock, which is fine, though OPP core will likely do
> > clk_get_rate() here.
> >
> > which may suggest that some underlying assumption has changed. [1]
> >

Yikes.

>
> I just looked into the issue this morning. The commit that triggered the errors
> seem to be doing the right thing (although the commit message was a bit hard to
> understand), but the problem is this check which gets triggered now:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/clk/clk.c?h=v6.13-rc1#n2319
>
> Since the qcom-cpufreq* clocks doesn't have parents now (they should've been
> defined anyway) and there is no CLK_SET_RATE_PARENT flag set, the check returns
> NULL for the 'top' clock. Then clk_core_set_rate_nolock() returns -EINVAL,
> causing the reported error.
>
> But I don't quite understand why clk_core_set_rate_nolock() fails if there is no
> parent or CLK_SET_RATE_PARENT is not set. The API is supposed to set the rate of
> the passed clock irrespective of the parent. Propagating the rate change to
> parent is not strictly needed and doesn't make sense if the parent is a fixed
> clock like XO.

The recalc_rate clk_op is telling the framework that the clk is at a
different rate than is requested by the clk consumer _and_ than what the
framework thinks the clk is currently running at. The clk_set_rate()
call is going to attempt to satisfy that request, and because there
isn't a determine_rate/round_rate clk_op it assumes the clk can't change
rate so it looks to see if there's a parent that can be changed to
satisfy the rate. There isn't a parent either, so the clk_set_rate()
call fails because the rate can't be achieved on this clk.

It may work to have a determine_rate clk_op that is like the recalc_rate
one that says "this rate you requested is going to turn into whatever
the hardware is running at" by simply returning the rate that the clk is
running at.