Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

From: Saravana Kannan
Date: Wed Jan 20 2021 - 12:34:50 EST


On Wed, Jan 20, 2021 at 6:27 AM Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote:
>
> Hi Saravana,
>
> On Wed, Jan 20, 2021 at 10:40 AM Geert Uytterhoeven
> <geert@xxxxxxxxxxxxxx> wrote:
> > On Tue, Jan 19, 2021 at 10:51 PM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> > > On Tue, Jan 19, 2021 at 10:08 AM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> > > > On Tue, Jan 19, 2021 at 1:05 AM Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote:
> > > > > On Mon, Jan 18, 2021 at 10:19 PM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> > > > > > On Mon, Jan 18, 2021 at 11:16 AM Geert Uytterhoeven
> > > > > > <geert@xxxxxxxxxxxxxx> wrote:
> > > > > > > On Mon, Jan 18, 2021 at 6:59 PM Marc Zyngier <maz@xxxxxxxxxx> wrote:
> > > > > > > > On 2021-01-18 17:39, Geert Uytterhoeven wrote:
> > > > > > > > > On Fri, Dec 18, 2020 at 4:34 AM Saravana Kannan <saravanak@xxxxxxxxxx>
> > > > > > > > > wrote:
> > > > > > > > >> Cyclic dependencies in some firmware was one of the last remaining
> > > > > > > > >> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> > > > > > > > >> dependencies don't block probing, set fw_devlink=on by default.
> > > > > > > > >>
> > > > > > > > >> Setting fw_devlink=on by default brings a bunch of benefits
> > > > > > > > >> (currently,
> > > > > > > > >> only for systems with device tree firmware):
> > > > > > > > >> * Significantly cuts down deferred probes.
> > > > > > > > >> * Device probe is effectively attempted in graph order.
> > > > > > > > >> * Makes it much easier to load drivers as modules without having to
> > > > > > > > >> worry about functional dependencies between modules (depmod is still
> > > > > > > > >> needed for symbol dependencies).
> > > > > > > > >>
> > > > > > > > >> If this patch prevents some devices from probing, it's very likely due
> > > > > > > > >> to the system having one or more device drivers that "probe"/set up a
> > > > > > > > >> device (DT node with compatible property) without creating a struct
> > > > > > > > >> device for it. If we hit such cases, the device drivers need to be
> > > > > > > > >> fixed so that they populate struct devices and probe them like normal
> > > > > > > > >> device drivers so that the driver core is aware of the devices and
> > > > > > > > >> their
> > > > > > > > >> status. See [1] for an example of such a case.
> > > > > > > > >>
> > > > > > > > >> [1] -
> > > > > > > > >> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@xxxxxxxxxxxxxx/
> > > > > > > > >> Signed-off-by: Saravana Kannan <saravanak@xxxxxxxxxx>
> > > > > > > > >
> > > > > > > > > Shimoda-san reported that next-20210111 and later fail to boot
> > > > > > > > > on Renesas R-Car Gen3 platforms. No output is seen, unless earlycon
> > > > > > > > > is enabled.
> > > > > > > > >
> > > > > > > > > I have bisected this to commit e590474768f1cc04 ("driver core: Set
> > > > > > > > > fw_devlink=on by default").
> > > > > > > >
> > > > > > > > There is a tentative patch from Saravana here[1], which works around
> > > > > > > > some issues on my RK3399 platform, and it'd be interesting to find
> > > > > > > > out whether that helps on your system.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > M.
> > > > > > > >
> > > > > > > > [1]
> > > > > > > > https://lore.kernel.org/r/20210116011412.3211292-1-saravanak@xxxxxxxxxx
> > > > > > >
> > > > > > > Thanks for the suggestion, but given no devices probe (incl. GPIO
> > > > > > > providers), I'm afraid it won't help. [testing] Indeed.
> > > > > > >
> > > > > > > With the debug prints in device_links_check_suppliers enabled, and
> > > > > > > some postprocessing, I get:
> > > > > > >
> > > > > > > 255 supplier e6180000.system-controller not ready
> > > > > > > 9 supplier fe990000.iommu not ready
> > > > > > > 9 supplier fe980000.iommu not ready
> > > > > > > 6 supplier febd0000.iommu not ready
> > > > > > > 6 supplier ec670000.iommu not ready
> > > > > > > 3 supplier febe0000.iommu not ready
> > > > > > > 3 supplier e7740000.iommu not ready
> > > > > > > 3 supplier e6740000.iommu not ready
> > > > > > > 3 supplier e65ee000.usb-phy not ready
> > > > > > > 3 supplier e6570000.iommu not ready
> > > > > > > 3 supplier e6054000.gpio not ready
> > > > > > > 3 supplier e6053000.gpio not ready
> > > > > > >
> > > > > > > As everything is part of a PM Domain, the (lack of the) system controller
> > > > > > > must be the culprit. What's wrong with it? It is registered very early in
> > > > > > > the boot:
> > > > > > >
> > > > > > > [ 0.142096] rcar_sysc_pd_init:442: of_genpd_add_provider_onecell() returned 0
> > > > >
> > > > > > Looks like you found the important logs. Can you please enable all
> > > > > > these logs and send the early con logs as an attachment (so I don't
> > > > > > need to deal with lines getting wrapped)?
> > > > > > 1. The ones in device_links_check_suppliers()
> > > > > > 2. The ones in device_link_add()
> > > > > > 3. initcall_debug=1
> > > > >
> > > > > I have attached[*] the requested log.
> > > > >
> > > > > > That should help us figure out what's going on. Also, what's the DT
> > > > > > that corresponds to one of the boards that see this issue?
> > > > >
> > > > > arch/arm64/boot/dts/renesas/r8a77951-salvator-xs.dts
> > > > >
> > > > > > Lastly, can you please pick up these 3 patches (some need clean up
> > > > > > before they merge) to make sure it's not an issue being worked on from
> > > > > > other bug reports?
> > > > > > https://lore.kernel.org/lkml/20210116011412.3211292-1-saravanak@xxxxxxxxxx/
> > > > > > https://lore.kernel.org/lkml/20210115210159.3090203-1-saravanak@xxxxxxxxxx/
> > > > > > https://lore.kernel.org/lkml/20201218210750.3455872-1-saravanak@xxxxxxxxxx/
> > > > > >
> > > > > > I have a strong hunch the 2nd one will fix your issues. fw_devlink can
> > > > > > handle cyclic dependencies now (it basically reverts to
> > > > > > fw_devlink=permissive mode for devices in the cycle), but it needs to
> > > > > > "see" all the dependencies to know there's a cycle. So want to make
> > > > > > sure it "sees" the "gpios" binding used all over some of the Renesas
> > > > > > DT files.
> > > > >
> > > > > These patches don't help.
> > > > > The 2nd one actually introduces a new failure:
> > > > >
> > > > > OF: /soc/i2c@e66d8000/gpio@20/pcie-sata-switch-hog: could not get
> > > > > #gpio-cells for /cpus/cpu@102
> > > > >
> > > > > Note that my issues don't seem to be GPIO-related at all.
> >
> > > I took a look at your logs. It looks like your guess is right. It's at
> > > least one of the issues.
> > >
> > > You'll need to convert drivers/soc/renesas/rcar-sysc.c into a platform
> > > driver. You already have a platform device created for it. So just go
> > > ahead and probe it with a platform driver. See what Marek did here
> > > [1].
> > >
> > > You probably had to implement it as an "initcall based driver"
> > > because you had to play initcall chicken to make sure the PD hardware
> > > was initialized before the consumers. With fw_devlink=on you won't
> > > have to worry about that. As an added benefit of implementing a proper
> > > platform driver, you can actually implement runtime PM now, your
> > > suspend/resume would be more robust, etc.
> >
> > On R-Car H1, the system controller driver needs to be active before
> > secondary CPU setup, hence the early_initcall().
> > platform_bus_init() is called after that, so this is gonna need a split
> > initialization. Or a dummy platform driver to make devlinks think
> > everything is fine ;-)

I was wondering if you could still probe the "not needed by CPU" power
domains (if there are any) as devices. Using driver-core brings you
good things :)

>
> Note that adding a dummy platform driver does work.
>
> > So basically all producer DT drivers not using a platform (or e.g. i2c)
> > driver are now broken?
> > Including all clock drivers using CLK_OF_DECLARE()?
>
> Oh, of_link_to_phandle() ignores device nodes where OF_POPULATED
> is set, and of_clk_init() sets that flag. So rcar-sysc should do so, too.
> Patch sent.
>
> > $ git grep -L "\<[a-z0-9]*_driver\>" -- $(git grep -l
> > "\.compatible\>") | wc -l
> > 249
> >
> > (includes false positives)
> >
> > I doubt they'll all get fixed for v5.12, as we're already at rc4...
>
> Still more than 100 drivers to fix?

Not fully sure what the grep is trying to catch, but fw_devlink
supports devices on any bus (i2c, platform, pci, etc). So that's not a
problem. It'll be a problem when a struct device is never created for
a real device. Or if it's created, but never probed.

I'm also looking into a bunch of other options for fallback when
fw_devlink=on doesn't work. Too much to explain here -- patches are
easier :)

-Saravana