Re: [PATCH v2] clk: mediatek: clk-mux: Add .determine_rate() callback
From: Maxime Ripard
Date: Tue Oct 18 2022 - 03:07:13 EST
Hi Stephen,
On Fri, Oct 14, 2022 at 12:36:50PM -0700, Stephen Boyd wrote:
> Quoting Maxime Ripard (2022-10-12 06:56:19)
> >
> > I think we need to address this in multiple ways:
> >
> > - If you have ftrace enabled on that platform, running with "tp_printk
> > trace_event=clk:*" in the kernel command line on a failing kernel would
> > be great, so that we can figure out what is happening exactly.
> >
> > - We really need to merge your patch above as well;
> >
> > - And, if we can, we should forbid to register a mux without a
> > determine_rate implementation. We have to take into account that some
> > muxes are going to be RO and won't need a determine_rate
> > implementation at all, but a clock with a set_parent and without a
> > determine_rate seems like a weird combination.
> >
>
> So should I apply this patch now on clk-next? Given this regression I'm
> leaning towards not sending off the clk rate request this merge window
> and letting it bake another cycle.
I saw that you sent the PR, thanks
We spent some time with Angelo yesterday debugging this, and it's still
not clear to me what happens.
He provided a significant amount of traces, and we first checked that
the round_rate part of set_rate was returning something meaningful, and
it does. The rate is fine, the parent is too, everything's great.
Next we checked the decisions by clk_calc_new_rates, and it does return
the proper top clock, and its proper rate.
Finally, we hooked into clk_change_rate() to see what kind of decision
it was enforcing, and it seems to be ok as well. It doesn't change
parent, and it sets the proper rate, in both cases.
There's still one thing we haven't checked: one of the clock in the tree
(the parent of the one we want to change the rate on, and it has
SET_RATE_PARENT) has a notifier. As we've had a bug recently over this
I've not ruled out that this could be a similar bug.
I don't really think it is though, since the notifier callback doesn't
use the data provided by the framework:
https://elixir.bootlin.com/linux/v6.1-rc1/source/drivers/clk/mediatek/clk-mux.c#L279
I've pushed a branch for Angelo to test, just to confirm.
So... yeah. I can't explain the regression at this point. Do you have an
idea?
The good news is, since you merged this patch the regression is
invisible now to that platform. We still could encounter it on another
platform, but maybe it will also have a more obvious setup to replicate?
Thanks!
Maxime