Re: [PATCH v4 1/4] clk: fractional-divider: Export approximation algorithm to the CCF users

From: Andy Shevchenko
Date: Wed Sep 08 2021 - 06:53:29 EST


On Tue, Sep 07, 2021 at 09:17:47PM -0500, Chris Morgan wrote:
> On Tue, Sep 07, 2021 at 09:06:10PM +0300, Andy Shevchenko wrote:
> > On Tue, Sep 07, 2021 at 08:54:04PM +0300, Andy Shevchenko wrote:
> > > On Tue, Sep 07, 2021 at 10:44:00AM -0500, Chris Morgan wrote:

> > > > Unfortunately, I can confirm this breaks the DSI panel on the Rockchip
> > > > PX30 (and possibly other SoCs). Tested on my Odroid Go Advance. When
> > > > I revert 4e7cf74fa3b2 "clk: fractional-divider: Export approximation
> > > > algorithm to the CCF users" and 928f9e268611 "clk: fractional-divider:
> > > > Hide clk_fractional_divider_ops from wide audience" the panel begins
> > > > working again as expected on the master branch.
> > > >
> > > > It looks like an assumption is made in the vop_crtc_mode_fixup()
> > > > function in the rockchip_drm_vop.c that gets broken with this change.
> > > > Specifically, the function says in the comments "When DRM gives us a
> > > > mode, we should add 999 Hz to it.". I believe this is no longer true
> > > > after this clk change, and when I remove the + 999 from the function
> > > > the DSI panel works again. Note that I do not know the implications
> > > > of removing this 999 aside from that it fixes the DSI panel on my
> > > > PX30 after this change, so I don't know if it's a positive change
> > > > or not.
> > >
> > > Thank you for the report!
> > >
> > > I'll check this. Perhaps Heiko can help with testing as well on his side.
> >
> > On the first glance the mentioned patch may not be the culprit because it does
> > not change the functional behaviour (if I'm not mistaken). What really changes
> > it is the additional flag that removes the left-shift of the rate in the
> > calculations.
>
> I noticed the behavior on the 5.14 kernel was to set the numerator at
> an ungodly 7649082492112076800 and the denominator at 1 (no, that's not
> a typo). I think it tried to write 65535 to the register though, but it
> would go through this a few times and eventually settle on 1:1 as the
> fractional ratio (which I assume is all good, because that would work).
>
> Contrast this to the 5.15 behavior where it would try to set the ratio
> to 17001:17000, which would cause the DSI screen to fail to initalize.
>
> After tracing through the code I figured out that the VOP was trying to
> add 999 to the clock and set it at 17000999. 17000000/17000999 gives us
> 0, and subtracting 1 from that gives us a -1. The fls_long function
> would then return 64, and if we subtract 16 (the value of fd->mwidth
> for my board) it would tell us to shift the 17000999 48 bits to the
> left, which matches the ungodly large number.
>
> With the changes in 5.15 if I remove the + 999 from the VOP driver the
> clock then gets set at 17000000, since the parent is at 17000000 that
> gives us a 1:1 where everything works and everything is fine.
>
> Long story short I think this is a bug that's existed all along, and
> this change simply exposed it in a manner where it stopped working
> despite the bug being present. Unfortunately I neither know enough
> about the hardware to be confident in this fix beyond my specific
> board, nor do I have enough hardware to test it on anything except
> a Rockchip rk3326 with a DSI panel.

This is a very good analysis!

> > To me sounds like you found a proper solution to the issue and that +999 is
> > a hack against the (blindly?) copied part of the algorithm used in fractional
> > divider. Have you read the top comment in clk-fractional-divider.c? It should
> > explain how it works after my series.
>
> No, but I probably should read the docs more. I just stumbled on this
> series doing a bisect when the DSI panel stopped working.
>
> > In any case I'm not going to come to any conclusions right now and also want
> > to hear from people who have better understanding of this hardware.
>
> Yeah, I want to see what Heiko says after some more research, or anyone
> who has more familiarity with clocks/DRM than I do or who has more
> hardware to test on than I do.

After what I read above I can't add anything and what I think is the best
course of actions is to submit a patch with removal of +999 part and above
explanation. It would be nice to find the real commit ID that may be used
for a Fixes tag.

Then we at least will have a patch ready in case it's considered correct
by people from Rockchip side.

> I intended to send a message informing you that "hey, this breaks
> upstream", but I think it turns out it's more a matter of "hey,
> this makes a broken upstream break instead of limp along".

Understand.

--
With Best Regards,
Andy Shevchenko