Re: [PATCH] PM / devfreq: rk3399_dmc: Fix kernel oops when rockchip,pmu is absent

From: Heiko Stübner
Date: Tue Jun 23 2020 - 04:55:21 EST


Am Montag, 22. Juni 2020, 17:07:52 CEST schrieb Marc Zyngier:
> Hi Heiko,
>
> On 2020-06-22 14:54, Heiko Stübner wrote:
> > Hi Marc,
> >
> > Am Montag, 22. Juni 2020, 15:31:55 CEST schrieb Marc Zyngier:
> >> On Sat, 13 Jun 2020 11:24:35 +0100
> >> Marc Zyngier <maz@xxxxxxxxxx> wrote:
> >>
> >> > Booting a recent kernel on a rk3399-based system (nanopc-t4),
> >> > equipped with a recent u-boot and ATF results in the following:
> >> >
> >> > [ 5.607431] Unable to handle kernel NULL pointer dereference at virtual address 00000000000001e4
> >> > [ 5.608219] Mem abort info:
> >> > [ 5.608469] ESR = 0x96000004
> >> > [ 5.608749] EC = 0x25: DABT (current EL), IL = 32 bits
> >> > [ 5.609223] SET = 0, FnV = 0
> >> > [ 5.609600] EA = 0, S1PTW = 0
> >> > [ 5.609891] Data abort info:
> >> > [ 5.610149] ISV = 0, ISS = 0x00000004
> >> > [ 5.610489] CM = 0, WnR = 0
> >> > [ 5.610757] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000e62fb000
> >> > [ 5.611326] [00000000000001e4] pgd=0000000000000000, p4d=0000000000000000
> >> > [ 5.611931] Internal error: Oops: 96000004 [#1] SMP
> >> > [ 5.612363] Modules linked in: rockchip_thermal(E+) rk3399_dmc(E+) soundcore(E) dw_wdt(E) rockchip_dfi(E) nvmem_rockchip_efuse(E) pwm_rockchip(E) cfg80211(E+) rockchip_saradc(E) industrialio(E) rfkill(E) cpufreq_dt(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) realtek(E) nvme(E) nvme_core(E) t10_pi(E) xhci_plat_hcd(E) xhci_hcd(E) rtc_rk808(E) rk808_regulator(E) clk_rk808(E) dwc3(E) udc_core(E) roles(E) ulpi(E) rk808(E)
> >> fan53555(E) rockchipdrm(E) analogix_dp(E) dw_hdmi(E) cec(E)
> >> dw_mipi_dsi(E) fixed(E) dwc3_of_simple(E) phy_rockchip_emmc(E)
> >> gpio_keys(E) drm_kms_helper(E) phy_rockchip_inno_usb2(E)
> >> ehci_platform(E) dwmac_rk(E) stmmac_platform(E) phy_rockchip_pcie(E)
> >> ohci_platform(E) ohci_hcd(E) rockchip_io_domain(E) stmmac(E)
> >> phy_rockchip_typec(E) ehci_hcd(E) sdhci_of_arasan(E) mdio_xpcs(E)
> >> sdhci_pltfm(E) cqhci(E) drm(E) sdhci(E) phylink(E) of_mdio(E)
> >> usbcore(E) i2c_rk3x(E) dw_mmc_rockchip(E) dw_mmc_pltfm(E) dw_mmc(E)
> >> fixed_phy(E) libphy(E)
> >> > [ 5.612454] pl330(E)
> >> > [ 5.620255] CPU: 1 PID: 270 Comm: systemd-udevd Tainted: G E 5.7.0-13692-g83ae758d8b22 #1157
> >> > [ 5.621110] Hardware name: rockchip evb_rk3399/evb_rk3399, BIOS 2020.07-rc4-00023-g10d4cafe0f 06/10/2020
> >> > [ 5.621947] pstate: 40000005 (nZcv daif -PAN -UAO BTYPE=--)
> >> > [ 5.622446] pc : regmap_read+0x1c/0x80
> >> > [ 5.622787] lr : rk3399_dmcfreq_probe+0x6a4/0x8c0 [rk3399_dmc]
> >> > [ 5.623299] sp : ffff8000126cb8a0
> >> > [ 5.623594] x29: ffff8000126cb8a0 x28: ffff8000126cbdb0
> >> > [ 5.624063] x27: ffff0000f22dac40 x26: ffff0000f6779800
> >> > [ 5.624533] x25: ffff0000f6779810 x24: 00000000ffffffea
> >> > [ 5.625002] x23: 00000000ffffffea x22: ffff0000f65b74c8
> >> > [ 5.625471] x21: ffff0000f783ca08 x20: ffff0000f65b7480
> >> > [ 5.625941] x19: 0000000000000000 x18: 0000000000000001
> >> > [ 5.626410] x17: 0000000000000000 x16: 0000000000000000
> >> > [ 5.626878] x15: ffff0000f22db138 x14: ffffffffffffffff
> >> > [ 5.627347] x13: 0000000000000018 x12: ffff80001106a8c7
> >> > [ 5.627817] x11: 0000000000000003 x10: 0101010101010101
> >> > [ 5.627861] systemd[1]: Found device SPCC M.2 PCIE SSD 3.
> >> > [ 5.628286] x9 : ffff800008d7c89c x8 : 7f7f7f7f7f7f7f7f
> >> > [ 5.629238] x7 : fefefeff646c606d x6 : 1c0e0e0ee3e8e9f0
> >> > [ 5.629709] x5 : 706968630e0e0e1c x4 : 8080808000000000
> >> > [ 5.630178] x3 : 937b1b5b1b434b80 x2 : ffff8000126cb944
> >> > [ 5.630648] x1 : 0000000000000308 x0 : 0000000000000000
> >> > [ 5.631119] Call trace:
> >> > [ 5.631346] regmap_read+0x1c/0x80
> >> > [ 5.631654] rk3399_dmcfreq_probe+0x6a4/0x8c0 [rk3399_dmc]
> >> > [ 5.632142] platform_drv_probe+0x5c/0xb0
> >> > [ 5.632500] really_probe+0xe4/0x448
> >> > [ 5.632819] driver_probe_device+0xfc/0x168
> >> > [ 5.633191] device_driver_attach+0x7c/0x88
> >> > [ 5.633567] __driver_attach+0xac/0x178
> >> > [ 5.633914] bus_for_each_dev+0x78/0xc8
> >> > [ 5.634261] driver_attach+0x2c/0x38
> >> > [ 5.634582] bus_add_driver+0x14c/0x230
> >> > [ 5.634925] driver_register+0x6c/0x128
> >> > [ 5.635269] __platform_driver_register+0x50/0x60
> >> > [ 5.635692] rk3399_dmcfreq_driver_init+0x2c/0x1000 [rk3399_dmc]
> >> > [ 5.636226] do_one_initcall+0x50/0x230
> >> > [ 5.636569] do_init_module+0x60/0x248
> >> > [ 5.636902] load_module+0x21f8/0x28d8
> >> > [ 5.637237] __do_sys_finit_module+0xb0/0x118
> >> > [ 5.637627] __arm64_sys_finit_module+0x28/0x38
> >> > [ 5.638031] el0_svc_common.constprop.0+0x7c/0x1f8
> >> > [ 5.638456] do_el0_svc+0x2c/0x98
> >> > [ 5.638754] el0_svc+0x18/0x48
> >> > [ 5.639029] el0_sync_handler+0x8c/0x2d4
> >> > [ 5.639378] el0_sync+0x158/0x180
> >> > [ 5.639680] Code: a9bd7bfd 910003fd a90153f3 aa0003f3 (b941e400)
> >> > [ 5.640221] ---[ end trace 63675fe5d0021970 ]---
> >> >
> >> > This turns out to be due to the rk3399-dmc driver looking for
> >> > an *undocumented* property (rockchip,pmu), and happily using
> >> > a NULL pointer when the property isn't there.
> >> >
> >> > The very existence of this driver in the kernel is highly doubtful
> >> > (I'd expect firmware to deal with this directly), but in the meantime
> >> > let's prevent it from oopsing the kernel at probe time if this
> >> > property isn't present.
> >
> > TF-A is handling the actual frequency scaling, this driver is like a
> > glorified wrapper around the TF-A interface ... and the dmc_clock it
> > calls is just this firmware-interface (see
> > drivers/clk/rockchip/clk-ddr.c)
> >
> > And I guess it also works around some missing Coreboot functionality.
> >
> > On u-boot we have ddr-timings in the uboot-devicetree which
> > _now_ after so many years finally also gets passed on to TF-A
> > but coreboot uses a completely different system, so I guess ChromeOS
> > used this to also tell TF-A about the actual ram configuration.
>
> It is all very convoluted... :-/
>
> >> > Signed-off-by: Marc Zyngier <maz@xxxxxxxxxx>
> >> > ---
> >> > drivers/devfreq/rk3399_dmc.c | 17 ++++++++++-------
> >> > 1 file changed, 10 insertions(+), 7 deletions(-)
> >> >
> >> > diff --git a/drivers/devfreq/rk3399_dmc.c b/drivers/devfreq/rk3399_dmc.c
> >> > index 24f04f78285b..bee233a2e0ce 100644
> >> > --- a/drivers/devfreq/rk3399_dmc.c
> >> > +++ b/drivers/devfreq/rk3399_dmc.c
> >> > @@ -371,13 +371,16 @@ static int rk3399_dmcfreq_probe(struct platform_device *pdev)
> >> > }
> >> >
> >> > node = of_parse_phandle(np, "rockchip,pmu", 0);
> >> > - if (node) {
> >> > - data->regmap_pmu = syscon_node_to_regmap(node);
> >> > - of_node_put(node);
> >> > - if (IS_ERR(data->regmap_pmu)) {
> >> > - ret = PTR_ERR(data->regmap_pmu);
> >> > - goto err_edev;
> >> > - }
> >> > + if (!node) {
> >> > + ret = -ENODEV;
> >> > + goto err_edev;
> >> > + }
> >> > +
> >> > + data->regmap_pmu = syscon_node_to_regmap(node);
> >> > + of_node_put(node);
> >> > + if (IS_ERR(data->regmap_pmu)) {
> >> > + ret = PTR_ERR(data->regmap_pmu);
> >> > + goto err_edev;
> >> > }
> >> >
> >> > regmap_read(data->regmap_pmu, RK3399_PMUGRF_OS_REG2, &val);
> >>
> >>
> >> Any opinion on this patch? I can't believe I'm the only one hitting
> >> this.
> >
> > Looking at my test-kernel-config, I don't seem to actually build this
> > driver
> > though I'm also not using boards productively.
> >
> > But looking deeper, I'm either blind or nothing really carries a
> > dt-node
> > with a "rockchip,rk3399-dmc" compatible at all, so I'm wondering how
> > you
> > could actually hit it yourself :-) .
>
> maz@fine-girl:~$ sudo dtc -I dtb /sys/firmware/fdt 2>/dev/null | grep -A
> 5 dmc
> dmc {
> u-boot,dm-pre-reloc;
> compatible = "rockchip,rk3399-dmc";
> devfreq-events = <0xc8>;
>
> [followed by a ton of timings...]
>
> It is definitely coming from u-boot (I don't provide any DTB otherwise,
> and you can find the corresponding node and timings in the u-boot tree).

which is probably the source of the problem :-) .

I'm pretty sure the "reviewed" binding in the kernel doesn't match the
dt-nodes used in uboot.

While u-boot these days syncs the main devicetrees from Linux, the memory
setup stuff is pretty specific to uboot (and lives in separate dtsi files).

And I guess you're the only one feeding uboot's dtb to Linux directly, hence
nobody else did encounter this before ;-) .


Heiko


> > The change was introduced only last year with
> > commit 9173c5ceb035 ("PM / devfreq: rk3399_dmc: Pass ODT and auto
> > power down parameters to TF-A.")
> >
> > So adding that rockchip,pmu property should be optional to not break
> > dt-bindings, so I guess instead of erroring out, the one regmap_read
> > below
> > that (only one using the regmap) should actually just check for
> > regmap_pmu != NULL?
>
> But what does it mean then? Not finding the regmap means not finding
> the DDR type, in which case the best you can do is to hit the default
> case which errors out.
>
> So I went ahead and made the whole of 9173c5ceb035 depend on the
> regmap being available. I now get:
>
> [ 6.618548] rk3399-dmc-freq ffa80000.dmc: Invalid operating-points in
> device tree.
> [ 6.619256] rk3399-dmc-freq: probe of ffa80000.dmc failed with error
> -22
>
> which suits me perfectly.
>
> I'll post a new revision in a few minutes.
>
> Thanks,
>
> M.
>