Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

From: Geert Uytterhoeven
Date: Thu Jan 21 2021 - 11:08:37 EST


Hi Saravana,

On Wed, Jan 20, 2021 at 6:23 PM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> On Wed, Jan 20, 2021 at 6:27 AM Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote:
> > On Wed, Jan 20, 2021 at 10:40 AM Geert Uytterhoeven
> > <geert@xxxxxxxxxxxxxx> wrote:
> > > On Tue, Jan 19, 2021 at 10:51 PM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> > > > On Tue, Jan 19, 2021 at 10:08 AM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> > > > > On Tue, Jan 19, 2021 at 1:05 AM Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote:
> > > > > > On Mon, Jan 18, 2021 at 10:19 PM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> > > > > > > On Mon, Jan 18, 2021 at 11:16 AM Geert Uytterhoeven
> > > > > > > <geert@xxxxxxxxxxxxxx> wrote:
> > > > > > > > On Mon, Jan 18, 2021 at 6:59 PM Marc Zyngier <maz@xxxxxxxxxx> wrote:
> > > > > > > > > On 2021-01-18 17:39, Geert Uytterhoeven wrote:
> > > > > > > > > > On Fri, Dec 18, 2020 at 4:34 AM Saravana Kannan <saravanak@xxxxxxxxxx>
> > > > > > > > > > wrote:
> > > > > > > > > >> Cyclic dependencies in some firmware was one of the last remaining
> > > > > > > > > >> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> > > > > > > > > >> dependencies don't block probing, set fw_devlink=on by default.
> > > > > > > > > >>
> > > > > > > > > >> Setting fw_devlink=on by default brings a bunch of benefits
> > > > > > > > > >> (currently,
> > > > > > > > > >> only for systems with device tree firmware):
> > > > > > > > > >> * Significantly cuts down deferred probes.
> > > > > > > > > >> * Device probe is effectively attempted in graph order.
> > > > > > > > > >> * Makes it much easier to load drivers as modules without having to
> > > > > > > > > >> worry about functional dependencies between modules (depmod is still
> > > > > > > > > >> needed for symbol dependencies).
> > > > > > > > > >>
> > > > > > > > > >> If this patch prevents some devices from probing, it's very likely due
> > > > > > > > > >> to the system having one or more device drivers that "probe"/set up a
> > > > > > > > > >> device (DT node with compatible property) without creating a struct
> > > > > > > > > >> device for it. If we hit such cases, the device drivers need to be
> > > > > > > > > >> fixed so that they populate struct devices and probe them like normal
> > > > > > > > > >> device drivers so that the driver core is aware of the devices and
> > > > > > > > > >> their
> > > > > > > > > >> status. See [1] for an example of such a case.
> > > > > > > > > >>
> > > > > > > > > >> [1] -
> > > > > > > > > >> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@xxxxxxxxxxxxxx/
> > > > > > > > > >> Signed-off-by: Saravana Kannan <saravanak@xxxxxxxxxx>
> > > > > > > > > >
> > > > > > > > > > Shimoda-san reported that next-20210111 and later fail to boot
> > > > > > > > > > on Renesas R-Car Gen3 platforms. No output is seen, unless earlycon
> > > > > > > > > > is enabled.
> > > > > > > > > >
> > > > > > > > > > I have bisected this to commit e590474768f1cc04 ("driver core: Set
> > > > > > > > > > fw_devlink=on by default").

> > > > You'll need to convert drivers/soc/renesas/rcar-sysc.c into a platform
> > > > driver. You already have a platform device created for it. So just go
> > > > ahead and probe it with a platform driver. See what Marek did here
> > > > [1].
> > > >
> > > > You probably had to implement it as an "initcall based driver"
> > > > because you had to play initcall chicken to make sure the PD hardware
> > > > was initialized before the consumers. With fw_devlink=on you won't
> > > > have to worry about that. As an added benefit of implementing a proper
> > > > platform driver, you can actually implement runtime PM now, your
> > > > suspend/resume would be more robust, etc.
> > >
> > > On R-Car H1, the system controller driver needs to be active before
> > > secondary CPU setup, hence the early_initcall().
> > > platform_bus_init() is called after that, so this is gonna need a split
> > > initialization. Or a dummy platform driver to make devlinks think
> > > everything is fine ;-)
>
> I was wondering if you could still probe the "not needed by CPU" power
> domains (if there are any) as devices. Using driver-core brings you
> good things :)

1. That would mean splitting the driver in two parts, looping over the
tables twice, while everything can just be done in the first pass?

2. Which "good things" do you have in mind? Making the driver modular?
Ignoring the dependency for secondary CPU setup on R-Car H1, this
driver could indeed be modular on R-Car Gen2 and Gen3, as long as
the boot loader would pass a ramdisk with the module to the kernel.
The ramdisk could not be loaded in any other way, as all I/O
devices are part of a PM Domain, and thus depend on the SYSC driver.
Note that on some (non-R-Car) SoCs, the timers may be part of a PM
Domain, too.

> > > So basically all producer DT drivers not using a platform (or e.g. i2c)
> > > driver are now broken?
> > > Including all clock drivers using CLK_OF_DECLARE()?
> >
> > Oh, of_link_to_phandle() ignores device nodes where OF_POPULATED
> > is set, and of_clk_init() sets that flag. So rcar-sysc should do so, too.
> > Patch sent.
> > > $ git grep -L "\<[a-z0-9]*_driver\>" -- $(git grep -l
> > > "\.compatible\>") | wc -l
> > > 249
> > >
> > > (includes false positives)
> > >
> > > I doubt they'll all get fixed for v5.12, as we're already at rc4...
> >
> > Still more than 100 drivers to fix?
>
> Not fully sure what the grep is trying to catch, but fw_devlink
> supports devices on any bus (i2c, platform, pci, etc). So that's not a
> problem. It'll be a problem when a struct device is never created for
> a real device. Or if it's created, but never probed.

The grep tries to catch drivers using DT matching (i.e. matching ".compatible")
and not using a driver model driver (i.e. not matching "*_driver").

> I'm also looking into a bunch of other options for fallback when
> fw_devlink=on doesn't work. Too much to explain here -- patches are
> easier :)

I gave it a try on all Renesas platforms I have local access to:

- R-Car Gen2/Gen3:
Setting OF_POPULATED in the rcar-sysc driver[1] made my standard
config boot again. Remaining issues:
- CONFIG_IPMMU_VMSA=n hangs: supplier fe990000.iommu not ready
- CONFIG_RCAR_DMAC=n hangs: supplier e7310000.dma-controller not ready
Note that Ethernet does not use the R-Car DMAC, so DHCP works.
Nevertheless, after that everything hangs, and the board does not
respond to pings anymore
Both IOMMU and DMAC dependencies are optional, hence should be dropped
at late boot (late_initcall?).

- SH-Mobile AG5 and R-Mobile APE6:
The rmobile-sysc driver is similar to the rcar-sysc driver, and does
not use a platform device.
Still, it works, because all dependencies on the System Controller
become unblocked when the rmobile-reset driver binds against the
"renesas,sysc-rmobile" device. Obviously it would fail if no
support for that driver is included in your kernel...

- R-Mobile A1:
Also using the rmobile-sysc driver.
However, this is a single core Cortex-A9, i.e. it does not have an
ARM architectured timer (like R-Mobile APE6) or Cortex-A9 Global
Timer (like SH-Mobile AG5). The timer used (TMU) is located in a PM
Domain controlled by the rmobile-sysc driver, and driver
initialization is postponed beyond the point where something relies
on a working timer, causing a hang.

Setting OF_POPULATED (like in my fix for the rcar-sysc driver) fixes
this, but prevents the rmobile-reset driver from binding against the
same device node, so the reset handling will have to be incorporated
into the rmobile-sysc driver (and will thus be registered very
early).

- RZ/A1 and RZ/A2:
These are not affected, as the timer used (OSTM) is not a platform
driver, but uses TIMER_OF_DECLARE().
Note that the RZ/A2 clock driver uses split initialization:
1. Early (timer) clocks are initialized from CLK_OF_DECLARE_DRIVER,
2. Other clocks are initialized by platform_driver_probe() from a
subsys_initcall.
If the OSTM driver would be a platform_driver, it would block on the
block dependency. Setting the OF_POPULATED flag in the clock driver
would not work: while that flag would unblock probing of the timer
driver, it would also prevent the second part of the clock driver
initialization.

Now, back to the things I was supposed to work on this week ;-)

[1] https://lore.kernel.org/linux-arm-kernel/20210120142323.2203705-1-geert+renesas@xxxxxxxxx/


Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds