Re: PROBLEM: Broken or delayed ethernet on Xilinx ZCU104 since 5.18 (regression)

From: Russell King (Oracle)
Date: Fri Aug 04 2023 - 18:27:43 EST


On Fri, Aug 04, 2023 at 05:31:21PM -0400, Nick Bowler wrote:
> On 2023-08-04, Rob Herring <robh@xxxxxxxxxx> wrote:
> > On Fri, Aug 4, 2023 at 11:52 AM Nick Bowler <nbowler@xxxxxxxxxx> wrote:
> >> I don't know about the deferred probe timeout, but I bisected the 6.5-rc4
> >> breakage to this commit:
> >>
> >> commit c720a1f5e6ee8cb39c28435efc0819cec84d6ee2
> >> Author: Michal Simek <michal.simek@xxxxxxx>
> >> Date: Mon May 22 16:59:48 2023 +0200
> >>
> >> arm64: zynqmp: Describe TI phy as ethernet-phy-id
> >
> > I don't see anything obviously problematic with that commit. (The
> > #phy-cells property added is wrong as ethernet phys don't use the phy
> > binding, but that should just be ignored). I'd check if the phy probed
> > and has a DT node associated with it.
>
> I think the answer is "no, the phy was not probed". Without reverting
> that commit, there is absolutely nothing in /sys/bus/mdio_bus/devices.
> There is no phy device link under /sys/bus/mdio_bus/drivers/"TI DP83867",
> and there is no mdio_bus under /sys/bus/platform/devices/ff0e0000.ethernet.
>
> When I revert that commit, I can locate the phy device under all these
> locations.
>
> > fw_devlink tracks parent-child dependencies and maybe changing to
> > parent-grandchild affected that. We don't yet track 'phy-handle'
> > dependencies, but we'd have a circular one here if we did (though that
> > should be handled). Does "fw_devlink=off" help?
>
> Booting with fw_devlink=off results in no obvious change in behaviour.

I think we need to rewind a tad.

My understanding is that this uses the Cadence macb driver.

In your original message, you said that the ethernet driver wasn't
being bound to the driver.

Since the ethernet driver is responsible for spotting the "mdio"
sub-node and creating the MDIO bus, if the driver isn't being
successfully bound, then the MDIO bus and the PHYs on the bus won't be
created, so you won't find them in /sys/bus/mdio_bus/devices.

Moreover, the Cadence macb driver, and this doesn't care about the
presence of the PHY at probe time, only when the network interface is
brought up. See macb_phylink_connect() which is called from
macb_open().

So, I think that the deferred probing has nothing to do with PHYs, and
that's just a wild goose chase.

I think instead we need to be concentrating on what's going on with
the ethernet driver, and why the ethernet driver is deferring its
probe. Is macb_probe() getting called at all? How far through
macb_probe() do we get before we defer?

I think those are the key questions that need answering.

Maybe, if you can get access to the machine while the driver is
deferring, /sys/kernel/debug/devices_deferred might give some
useful information, but that's just a hope.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!