Re: [PATCH 1/2] irqchip: irq-meson-gpio: make it possible to build as a module

From: Neil Armstrong
Date: Thu Aug 05 2021 - 02:31:19 EST


Hi Saravana,

On 04/08/2021 23:47, Saravana Kannan wrote:
> On Wed, Aug 4, 2021 at 11:20 AM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
>>
>> On Wed, Aug 4, 2021 at 1:50 AM Marc Zyngier <maz@xxxxxxxxxx> wrote:
>>>
>>> On Wed, 04 Aug 2021 02:36:45 +0100,
>>> Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
>>>
>>> Hi Saravana,
>>>
>>> Thanks for looking into this.
>>
>> You are welcome. I just don't want people to think fw_devlink is broken :)
>>
>>>
>>> [...]
>>>
>>>>> Saravana, could you please have a look from a fw_devlink perspective?
>>>>
>>>> Sigh... I spent several hours looking at this and wrote up an analysis
>>>> and then realized I might be looking at the wrong DT files.
>>>>
>>>> Marc, can you point me to the board file in upstream that corresponds
>>>> to the platform in which you see this issue? I'm not asking for [1],
>>>> but the actual final .dts (not .dtsi) file that corresponds to the
>>>> platform/board/system.
>>>
>>> The platform I can reproduce this on is described in
>>> arch/arm64/boot/dts/amlogic/meson-sm1-khadas-vim3l.dts. It is an
>>> intricate maze of inclusion, node merge and other DT subtleties. I
>>> suggest you look at the decompiled version to get a view of the
>>> result.
>>
>> Thanks. After decompiling it, it looks something like (stripped a
>> bunch of reg and address properties and added the labels back):
>>
>> eth_phy: mdio-multiplexer@4c000 {
>> compatible = "amlogic,g12a-mdio-mux";
>> clocks = <0x02 0x13 0x1e 0x02 0xb1>;
>> clock-names = "pclk\0clkin0\0clkin1";
>> mdio-parent-bus = <0x22>;
>>
>> ext_mdio: mdio@0 {
>> reg = <0x00>;
>>
>> ethernet-phy@0 {
>> max-speed = <0x3e8>;
>> interrupt-parent = <0x23>;
>> interrupts = <0x1a 0x08>;
>> phandle = <0x16>;
>> };
>> };
>>
>> int_mdio: mdio@1 {
>> ...
>> }
>> }
>>
>> And phandle 0x23 refers to the gpio_intc interrupt controller with the
>> modular driver.
>>
>>>> Based on your error messages, it's failing for mdio@0 which
>>>> corresponds to ext_mdio. But none of the board dts files in upstream
>>>> have a compatible property for "ext_mdio". Which means fw_devlink
>>>> _should_ propagate the gpio_intc IRQ dependency all the way up to
>>>> eth_phy.
>>>>
>>>> Also, in the failing case, can you run:
>>>> ls -ld supplier:*
>>>>
>>>> in the /sys/devices/....<something>/ folder that corresponds to the
>>>> "eth_phy: mdio-multiplexer@4c000" DT node and tell me what it shows?
>>>
>>> Here you go:
>>>
>>> root@tiger-roach:~# find /sys/devices/ -name 'supplier*'|grep -i mdio | xargs ls -ld
>>> lrwxrwxrwx 1 root root 0 Aug 4 09:47 /sys/devices/platform/soc/ff600000.bus/ff64c000.mdio-multiplexer/supplier:platform:ff63c000.system-controller:clock-controller -> ../../../../virtual/devlink/platform:ff63c000.system-controller:clock-controller--platform:ff64c000.mdio-multiplexer
>>
>> As we discussed over chat, this was taken after the mdio-multiplexer
>> driver "successfully" probes this device. This will cause
>> SYNC_STATE_ONLY device links created by fw_devlink to be deleted
>> (because they are useless after a device probes). So, this doesn't
>> show the info I was hoping to demonstrate.
>>
>> In any case, one can see that fw_devlink properly created the device
>> link for the clocks dependency. So fw_devlink is parsing this node
>> properly. But it doesn't create a similar probe order enforcing device
>> link between the mdio-multiplexer and the gpio_intc because the
>> dependency is only present in a grand child DT node (ethernet-phy@0
>> under ext_mdio). So fw_devlink is working as intended.
>>
>> I spent several hours squinting at the code/DT yesterday. Here's what
>> is going on and causing the problem:
>>
>> The failing driver in this case is
>> drivers/net/mdio/mdio-mux-meson-g12a.c. And the only DT node it's
>> handling is what I pasted above in this email. In the failure case,
>> the call flow is something like this:
>>
>> g12a_mdio_mux_probe()
>> -> mdio_mux_init()
>> -> of_mdiobus_register(ext_mdio DT node)
>> -> of_mdiobus_register_phy(ext_mdio DT node)
>> -> several calls deep fwnode_mdiobus_phy_device_register(ethernet_phy DT node)
>> -> Tried to get the IRQ listed in ethernet_phy and fails with
>> -EPROBE_DEFER because the IRQ driver isn't loaded yet.
>>
>> The error is propagated correctly all the way up to of_mdiobus_register(), but
>> mdio_mux_init() ignores the -EPROBE_DEFER from of_mdiobus_register() and just
>> continues on with the rest of the stuff and returns success as long as
>> one of the child nodes (in this case int_mdio) succeeds.
>>
>> Since the probe returns 0 without really succeeding, networking stuff
>> just fails badly after this. So, IMO, the real problem is with
>> mdio_mux_init() not propagating up the -EPROBE_DEFER. I gave Marc a
>> quick hack (pasted at the end of this email) to test my theory and he
>> confirmed that it fixes the issue (a few deferred probes later, things
>> work properly).
>>
>> Andrew, I don't see any good reason for mdio_mux_init() not
>> propagating the errors up correctly (at least for EPROBE_DEFER). I'll
>> send a patch to fix this. Please let me know if there's a reason it
>> has to stay as-is.
>
> I sent out the proper fix as a series:
> https://lore.kernel.org/lkml/20210804214333.927985-1-saravanak@xxxxxxxxxx/T/#t

Thanks a lot for digging here and providing the appropriate fixes !

Neil

>
> Marc, can you give it a shot please?
>
> -Saravana
>
>>
>> -Saravana
>>
>> index 110e4ee85785..d973a267151f 100644
>> --- a/drivers/net/mdio/mdio-mux.c
>> +++ b/drivers/net/mdio/mdio-mux.c
>> @@ -170,6 +170,9 @@ int mdio_mux_init(struct device *dev,
>> child_bus_node);
>> mdiobus_free(cb->mii_bus);
>> devm_kfree(dev, cb);
>> + /* Not a final fix. I think it can cause UAF issues. */
>> + mdio_mux_uninit(pb);
>> + return r;
>> } else {
>> cb->next = pb->children;
>> pb->children = cb;