Re: [PATCH v1 1/2] driver core: fw_devlink: Add support for FWNODE_FLAG_BROKEN_PARENT

From: Vladimir Oltean
Date: Tue Aug 31 2021 - 21:28:34 EST


On Wed, Sep 01, 2021 at 02:18:04AM +0300, Vladimir Oltean wrote:
> On Wed, Sep 01, 2021 at 01:02:09AM +0200, Andrew Lunn wrote:
> > Rev B is interesting because switch0 and switch1 got genphy, while
> > switch2 got the correct Marvell PHY driver. switch2 PHYs don't have
> > interrupt properties, so don't loop back to their parent device.
>
> This is interesting and not what I really expected to happen. It goes to
> show that we really need more time to understand all the subtleties of
> device dependencies before jumping on patching stuff.

There is an even more interesting variation which I would like to point
out. It seems like a very odd loophole in the device links.

Take the example of the mv88e6xxx DSA driver. On my board
(arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts), even after I
had to declare the switches as interrupt controller and add interrupts
to their internal PHYs, I still need considerable force to 'break' this
board in the way discussed in this thread. The correct PHY driver insists
to probe, and not genphy. Let me explain.

The automatic device links between the switch (supplier, as interrupt-controller)
and PHYs (consumers) are added by fwnode_link_add, called from of_link_to_phandle.

Important note: fwnode_link_add does not link devices, it links OF nodes.

Even more important node, in the form of a comment:

* The driver core will use the fwnode link to create a device link between the
* two device objects corresponding to @con and @sup when they are created. The
* driver core will automatically delete the fwnode link between @con and @sup
* after doing that.

Okay?!

What seems to be omitted is that the DSA switch driver's probing itself
can be deferred. For example:

dsa_register_switch
-> dsa_switch_probe
-> dsa_switch_parse_of
-> dsa_switch_parse_ports_of
-> dsa_port_parse_of
-> of_find_net_device_by_node(of_parse_phandle(dn, "ethernet", 0));
-> not found => return -EPROBE_DEFER

When dsa_register_switch() returns -EPROBE_DEFER, it is effectively
an error path. So the reverse of initialization is performed.

The mv88e6xxx driver calls mv88e6xxx_mdios_register() right _before_
dsa_register_switch. So when dsa_register_switch returns error code,
mv88e6xxx_mdios_unregister() will be called.

When mv88e6xxx_mdios_unregister() is called, the MDIO buses with
internal PHYs are destroyed. So the PHY devices themselves are destroyed
too. And the device links between the DSA switch and the internal PHYs,
those created based on the firmware node links created by fwnode_link_add,
are dropped too.

Now remember the comment that the device links created based on
fwnode_link_add are not restored.

So probing of the DSA switch finally resumes, and this time
device_links_check_suppliers() is effectively bypassed, the PHYs no
longer request probe deferral due to their supplier not being ready,
because the device link no longer exists.

Isn't this self-sabotaging?!

Now generally, DSA drivers defer probing because they probe in parallel
with the DSA master. This is typical if the switch is on a SPI bus, or
I2C, or on an MDIO bus provided by a _standalone_ MDIO controller.

If the MDIO controller is not standalone, but is provided by Ethernet
controller that is the DSA master itself, then things change a lot,
because probing can never be parallel. The DSA master probes,
initializes its MDIO bus, and this triggers the probing of the MDIO
devices on that bus, one of which is the DSA switch. So DSA can no
longer defer the probe due to that reason.

Secondly, in DSA we even have variation between drivers as to where they
register their internal MDIO buses. The mv88e6xxx driver does this in
mv88e6xxx_probe (the probe function on the MDIO bus). The rtl8366rb
driver calls realtek_smi_setup_mdio() from rtl8366rb_setup(), and this
is important. DSA provides drivers with a .setup() callback, which is
guaranteed to take place after nothing can defer the switch's probe
anymore.

So putting two and two together, sure enough, if I move mv88e6xxx_mdios_register
from mv88e6xxx_probe to mv88e6xxx_setup, then I can reliably break this
setup, because the device links framework isn't sabotaging itself anymore.

Conversely, I am pretty sure that if rtl8366rb was to call of_mdiobus_register()
from the probe method and not the setup method, the entire design issue
with interrupts on internal DSA switch ports would have went absolutely
unnoticed for a few more years.

I have not tested this, but it also seems plausible that DSA can
trivially and reliably bypass any fw_devlink=on restrictions by simply
moving all of_mdiobus_register() driver calls from the .setup() method
to their respective probe methods (prior to calling dsa_register_switch),
then effectively fabricate an -EPROBE_DEFER during the first probe attempt.
I mean, who will know whether that probe deferral request was justified
or not?

Anyway, I'm not sure everyone agrees with this type of "solution" (even
though it's worth pointing it out as a fw_devlink limitation). In any
case, we need some sort of lightweight "fix" to the chicken-and-egg
problem, which will give me enough time to think of something better.
I hope it is at least clearer now that there are subtleties and nuances,
and we cannot just assess how many boards are broken by looking at the
device trees. By design, all are, sure, but they might still work, and
that's better than nothing...