Re: next/master bisection: baseline.dmesg.emerg on meson-gxbb-p200

From: Marc Zyngier
Date: Thu Nov 19 2020 - 13:35:12 EST


On 2020-11-19 18:13, Jerome Brunet wrote:
On Thu 19 Nov 2020 at 19:04, Guillaume Tucker
<guillaume.tucker@xxxxxxxxxxxxx> wrote:

Hi Marc,

On 19/11/2020 11:58, Marc Zyngier wrote:
On 2020-11-19 10:26, Neil Armstrong wrote:
On 19/11/2020 11:20, Marc Zyngier wrote:
On 2020-11-19 08:50, Guillaume Tucker wrote:
Please see the automated bisection report below about some kernel
errors on meson-gxbb-p200.

Reports aren't automatically sent to the public while we're
trialing new bisection features on kernelci.org, however this one
looks valid.

The bisection started with next-20201118 but the errors are still
present in next-20201119. Details for this regression:

https://kernelci.org/test/case/id/5fb6196bfd0127fd68d8d902/

The first error is:

[ 14.757489] Internal error: synchronous external abort: 96000210
[#1] PREEMPT SMP

Looks like yet another clock ordering setup. I guess different Amlogic
platforms have slightly different ordering requirements.

Neil, do you have any idea of which platform requires which ordering?
The variability in DT and platforms is pretty difficult to follow (and
I don't think I have such board around).

The requirements should be the same, here the init was done before calling
dw_hdmi_probe to be sure the clocks and internals resets were deasserted.
But since you boot from u-boot already enabling these, it's already active.

The solution would be to revert and do some check in meson_dw_hdmi_init() to
check if already enabled and do nothing.

A better fix seems to be this, which makes it explicit that there is
a dependency between some of the registers accessed from meson_dw_hdmi_init()
and the iahb clock.

Guillaume, can you give this a go on your failing box?

I confirm it solves the problem. Please add this to your fix
patch if it's OK with you:

Reported-by: "kernelci.org bot" <bot@xxxxxxxxxxxx>
Tested-by: Guillaume Tucker <guillaume.tucker@xxxxxxxxxxxxx>


For the record, it passed all the tests when applied on top of
the "bad" revision found by the bisection:

http://lava.baylibre.com:10080/scheduler/alljobs?search=v5.10-rc3-1021-gb8668a2e5ea1

and the exact same test on the "bad" revision without the fix
consistently showed the error:

http://lava.baylibre.com:10080/scheduler/job/374176


Thanks,
Guillaume


diff --git a/drivers/gpu/drm/meson/meson_dw_hdmi.c b/drivers/gpu/drm/meson/meson_dw_hdmi.c
index 7f8eea494147..52af8ba94311 100644
--- a/drivers/gpu/drm/meson/meson_dw_hdmi.c
+++ b/drivers/gpu/drm/meson/meson_dw_hdmi.c
@@ -146,6 +146,7 @@ struct meson_dw_hdmi {
struct reset_control *hdmitx_ctrl;
struct reset_control *hdmitx_phy;
struct clk *hdmi_pclk;
+ struct clk *iahb_clk;
struct clk *venci_clk;
struct regulator *hdmi_supply;
u32 irq_stat;
@@ -1033,6 +1034,13 @@ static int meson_dw_hdmi_bind(struct device *dev, struct device *master,
}
clk_prepare_enable(meson_dw_hdmi->hdmi_pclk);

+ meson_dw_hdmi->iahb_clk = devm_clk_get(dev, "iahb");
+ if (IS_ERR(meson_dw_hdmi->iahb_clk)) {
+ dev_err(dev, "Unable to get iahb clk\n");
+ return PTR_ERR(meson_dw_hdmi->iahb_clk);
+ }
+ clk_prepare_enable(meson_dw_hdmi->iahb_clk);

If you guys are going ahead with this fix, this call to
clk_prepare_enable() needs to be balanced with clk_disable_unprepare() somehow

Yup, good point.

Although this driver *never* disables any clock it enables, and leaves it
to the main DW driver, which I guess makes it leak references.

So all 3 clocks need fixing.

Thanks,

M.
--
Jazz is not dead. It just smells funny...