Re: [PATCH 1/2] irqchip: irq-meson-gpio: make it possible to build as a module

From: Saravana Kannan
Date: Fri Aug 06 2021 - 20:00:51 EST


On Wed, Aug 4, 2021 at 11:31 PM Neil Armstrong <narmstrong@xxxxxxxxxxxx> wrote:
>
> Hi Saravana,
>
> On 04/08/2021 23:47, Saravana Kannan wrote:
> > On Wed, Aug 4, 2021 at 11:20 AM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> >>
> >> On Wed, Aug 4, 2021 at 1:50 AM Marc Zyngier <maz@xxxxxxxxxx> wrote:
> >>>
> >>> On Wed, 04 Aug 2021 02:36:45 +0100,
> >>> Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> >>>
> >>> Hi Saravana,
> >>>
> >>> Thanks for looking into this.
> >>
> >> You are welcome. I just don't want people to think fw_devlink is broken :)
> >>
> >>>
> >>> [...]
> >>>
> >>>>> Saravana, could you please have a look from a fw_devlink perspective?
> >>>>
> >>>> Sigh... I spent several hours looking at this and wrote up an analysis
> >>>> and then realized I might be looking at the wrong DT files.
> >>>>
> >>>> Marc, can you point me to the board file in upstream that corresponds
> >>>> to the platform in which you see this issue? I'm not asking for [1],
> >>>> but the actual final .dts (not .dtsi) file that corresponds to the
> >>>> platform/board/system.
> >>>
> >>> The platform I can reproduce this on is described in
> >>> arch/arm64/boot/dts/amlogic/meson-sm1-khadas-vim3l.dts. It is an
> >>> intricate maze of inclusion, node merge and other DT subtleties. I
> >>> suggest you look at the decompiled version to get a view of the
> >>> result.
> >>
> >> Thanks. After decompiling it, it looks something like (stripped a
> >> bunch of reg and address properties and added the labels back):
> >>
> >> eth_phy: mdio-multiplexer@4c000 {
> >> compatible = "amlogic,g12a-mdio-mux";
> >> clocks = <0x02 0x13 0x1e 0x02 0xb1>;
> >> clock-names = "pclk\0clkin0\0clkin1";
> >> mdio-parent-bus = <0x22>;
> >>
> >> ext_mdio: mdio@0 {
> >> reg = <0x00>;
> >>
> >> ethernet-phy@0 {
> >> max-speed = <0x3e8>;
> >> interrupt-parent = <0x23>;
> >> interrupts = <0x1a 0x08>;
> >> phandle = <0x16>;
> >> };
> >> };
> >>
> >> int_mdio: mdio@1 {
> >> ...
> >> }
> >> }
> >>
> >> And phandle 0x23 refers to the gpio_intc interrupt controller with the
> >> modular driver.
> >>
> >>>> Based on your error messages, it's failing for mdio@0 which
> >>>> corresponds to ext_mdio. But none of the board dts files in upstream
> >>>> have a compatible property for "ext_mdio". Which means fw_devlink
> >>>> _should_ propagate the gpio_intc IRQ dependency all the way up to
> >>>> eth_phy.
> >>>>
> >>>> Also, in the failing case, can you run:
> >>>> ls -ld supplier:*
> >>>>
> >>>> in the /sys/devices/....<something>/ folder that corresponds to the
> >>>> "eth_phy: mdio-multiplexer@4c000" DT node and tell me what it shows?
> >>>
> >>> Here you go:
> >>>
> >>> root@tiger-roach:~# find /sys/devices/ -name 'supplier*'|grep -i mdio | xargs ls -ld
> >>> lrwxrwxrwx 1 root root 0 Aug 4 09:47 /sys/devices/platform/soc/ff600000.bus/ff64c000.mdio-multiplexer/supplier:platform:ff63c000.system-controller:clock-controller -> ../../../../virtual/devlink/platform:ff63c000.system-controller:clock-controller--platform:ff64c000.mdio-multiplexer
> >>
> >> As we discussed over chat, this was taken after the mdio-multiplexer
> >> driver "successfully" probes this device. This will cause
> >> SYNC_STATE_ONLY device links created by fw_devlink to be deleted
> >> (because they are useless after a device probes). So, this doesn't
> >> show the info I was hoping to demonstrate.
> >>
> >> In any case, one can see that fw_devlink properly created the device
> >> link for the clocks dependency. So fw_devlink is parsing this node
> >> properly. But it doesn't create a similar probe order enforcing device
> >> link between the mdio-multiplexer and the gpio_intc because the
> >> dependency is only present in a grand child DT node (ethernet-phy@0
> >> under ext_mdio). So fw_devlink is working as intended.
> >>
> >> I spent several hours squinting at the code/DT yesterday. Here's what
> >> is going on and causing the problem:
> >>
> >> The failing driver in this case is
> >> drivers/net/mdio/mdio-mux-meson-g12a.c. And the only DT node it's
> >> handling is what I pasted above in this email. In the failure case,
> >> the call flow is something like this:
> >>
> >> g12a_mdio_mux_probe()
> >> -> mdio_mux_init()
> >> -> of_mdiobus_register(ext_mdio DT node)
> >> -> of_mdiobus_register_phy(ext_mdio DT node)
> >> -> several calls deep fwnode_mdiobus_phy_device_register(ethernet_phy DT node)
> >> -> Tried to get the IRQ listed in ethernet_phy and fails with
> >> -EPROBE_DEFER because the IRQ driver isn't loaded yet.
> >>
> >> The error is propagated correctly all the way up to of_mdiobus_register(), but
> >> mdio_mux_init() ignores the -EPROBE_DEFER from of_mdiobus_register() and just
> >> continues on with the rest of the stuff and returns success as long as
> >> one of the child nodes (in this case int_mdio) succeeds.
> >>
> >> Since the probe returns 0 without really succeeding, networking stuff
> >> just fails badly after this. So, IMO, the real problem is with
> >> mdio_mux_init() not propagating up the -EPROBE_DEFER. I gave Marc a
> >> quick hack (pasted at the end of this email) to test my theory and he
> >> confirmed that it fixes the issue (a few deferred probes later, things
> >> work properly).
> >>
> >> Andrew, I don't see any good reason for mdio_mux_init() not
> >> propagating the errors up correctly (at least for EPROBE_DEFER). I'll
> >> send a patch to fix this. Please let me know if there's a reason it
> >> has to stay as-is.
> >
> > I sent out the proper fix as a series:
> > https://lore.kernel.org/lkml/20210804214333.927985-1-saravanak@xxxxxxxxxx/T/#t
>
> Thanks a lot for digging here and providing the appropriate fixes !

You are welcome!

Btw, 'm too lazy to download the mbox for your original patch
(justifiably not cc'ed in it) and reply to it. I made this comment
earlier too.

Can you please use the IRQCHIP_PLATFORM_DRIVER_BEGIN and
IRQCHIP_PLATFORM_DRIVER_END macros? They avoid boilerplate code every
irqchip driver has to implement, adds some restrictions to avoid
unbinding these drivers/unloading these modules, and also makes it
easy to convert from IRQCHIP_DECLARE to a platform driver. It'll also
allow you to drop the of_irq_find_parent() call in your probe.

Cheers,
Saravana

>
> Neil
>
> >
> > Marc, can you give it a shot please?
> >
> > -Saravana
> >
> >>
> >> -Saravana
> >>
> >> index 110e4ee85785..d973a267151f 100644
> >> --- a/drivers/net/mdio/mdio-mux.c
> >> +++ b/drivers/net/mdio/mdio-mux.c
> >> @@ -170,6 +170,9 @@ int mdio_mux_init(struct device *dev,
> >> child_bus_node);
> >> mdiobus_free(cb->mii_bus);
> >> devm_kfree(dev, cb);
> >> + /* Not a final fix. I think it can cause UAF issues. */
> >> + mdio_mux_uninit(pb);
> >> + return r;
> >> } else {
> >> cb->next = pb->children;
> >> pb->children = cb;
>