Re: [PATCH 0/1] clk: Meson8/8b/8m2: fix the mali clock flags

From: Stephen Boyd
Date: Mon Dec 16 2019 - 12:50:17 EST


Quoting Jerome Brunet (2019-12-16 01:13:31)
>
> On Sun 15 Dec 2019 at 22:01, Martin Blumenstingl <martin.blumenstingl@xxxxxxxxxxxxxx> wrote:
>
> > While playing with devfreq support for the lima driver I experienced
> > sporadic (random) system lockups. It turned out that this was in
> > certain cases when changing the mali clock.
> >
> > The Amlogic vendor GPU platform driver (which is responsible for
> > changing the clock frequency) uses the following pattern when updating
> > the mali clock rate:
> > - at initialization: initialize the two mali_0 and mali_1 clock trees
> > with a default setting and enable both clocks
> > - when changing the clock frequency:
> > -- set HHI_MALI_CLK_CNTL[31] to temporarily use the mali_1 clock output
> > -- update the mali_0 clock tree (set the mux, divider, etc.)
> > -- clear HHI_MALI_CLK_CNTL[31] to temporarily use the mali_0 clock
> ^ no final setting then ? :P
> > output again
> >
> > With the common clock framework we can even do better:
> > by setting CLK_SET_RATE_PARENT for the mali_0 and mali_1 output gates
> ^
> From your patch, I guess you mean CLK_SET_RATE_GATE ?
>
> > we can force the common clock framework to update the "inactive" clock
> > and then switch to it's output.
> >
> > I only tested this patch for a limited time only (approx. 2 hours).
> > So far I couldn't reproduce the sporadic system lockups with it.
> > However, broader testing would be great so I would like this to be
> > applied for -next.
>
> CLK_SET_RATE_GATE guarantees that a clock cannot be updated while in
> use. While it works at your advantage here, I'm not sure CCF guarantees
> the assumption this implementation is based on. Some explanation below:
>
> In your case, if it works as you expect when calling set_rate() on the
> top clock, it goes as this:
>
> - mali0 is use with rate X:
> - => set_rate(mali_top, Y)
> - mali0 is in use, cannot change, will round rate Y to X
> - mali1 is not in use, can provide Y
> - mali1 is determined to be the new best parent for mali top
>
> So far so good.
>
> - CCF pick the mali1 subtree
> *start updating the clock from the root to the leaf*
>
> So the mali top mux, which choose between mali0 and mali1, will be
> *updated last* which crucial to your use case.
>
> I just wonder if this crucial part something CCF guarantee and you can
> rely on it ... or if it might break in the future.
>
> Stephen, any thoughts on this ?

We have problems with the order in which we call the set_rate clk_op.
Sometimes clk providers want us to call from leaf to root but instead we
call from root to leaf because of implementation reasons. Controlling
the order in which clk operations are done is an unsolved problem. But
yes, in the future I'd like to see us introduce the vaporware that is
coordinated clk rates that would allow clk providers to decide what this
order should be, instead of having to do this "root-to-leaf" update.
Doing so would help us with the clk dividers that have some parent
changing rate that causes the downstream device to be overclocked while
we change the parent before the divider.

If there are more assumptions like this about how the CCF is implemented
then we'll have to be extra careful to not disturb the "normal" order of
operations when introducing something that allows clk providers to
modify it.

Also, isn't CLK_SET_RATE_GATE broken in the case that clk_set_rate()
isn't called on that particular clk? I seem to recall that the flag only
matters when it's applied to the "leaf" or entry point into the CCF from
a consumer API. I've wanted to fix that but never gotten around to it.
The whole flag sort of irks me because I don't understand what consumers
are supposed to do when this flag is set on a clk. How do they discover
it? They're supposed to "just know" and turn off the clk first and then
call clk_set_rate()? Why can't the framework do this all in the
clk_set_rate() call?

>
> PS: If CCF does guarantee "root-to-leaf" updates, I think this
> implementation is a clever trick to solve this usual glitch free clock
> update issue ... much more elegant that the notifier solution we have
> been using so far.