Re: PROBLEM: Broken or delayed ethernet on Xilinx ZCU104 since 5.18 (regression)
From: Rob Herring
Date: Fri Aug 04 2023 - 11:52:48 EST
On Fri, Aug 4, 2023 at 9:27 AM Nick Bowler <nbowler@xxxxxxxxxx> wrote:
>
> Hi,
>
> With recent kernels (5.18 and newer) the ethernet is all wonky on my
> ZCU104 board.
>
> There is some behaviour inconsistency between kernel versions identified
> during bisection, so maybe there is more than one issue with the ethernet?
>
> 6.5-rc4: after 10 seconds, the following message is printed:
>
> [ 10.761808] platform ff0e0000.ethernet: deferred probe pending
>
> but the network device seemingly never appears (I waited about a minute).
>
> 6.1 and 6.4: after 10 seconds, the device suddenly appears and starts
> working (but this is way too late).
10 sec is probably the deferred probe timeout. You can set this to
less time on the kernel command line.
> 5.18: the device never appears and no unusual messages are printed
> (I waited ten minutes).
>
> With 5.17 and earlier versions, the eth0 device appears without any delay.
>
> Unfortunately, as bisection closed on the problematic section, all the
> built kernels became untestable as they appear to crash during early
> boot. Nevertheless, I manually selected a commit that sounded relevant:
>
> commit e461bd6f43f4e568f7436a8b6bc21c4ce6914c36
> Author: Robert Hancock <robert.hancock@xxxxxxxxxx>
> Date: Thu Jan 27 10:37:36 2022 -0600
>
> arm64: dts: zynqmp: Added GEM reset definitions
>
> Reverting this fixes the problem on 5.18. Reverting this fixes the
> problem on 6.1. Reverting this fixes the problem on 6.4. In all of
> these versions, with this change reverted, the network device appears
> without delay.
With the above change, the kernel is going to be waiting for the reset
driver which either didn't exist or wasn't enabled in your config
(maybe kconfig needs to be tweaked to enable it automatically).
There's not really a better solution than the probe timeout when the
DT was incomplete and new dependencies get added.
> Unfortunately, it seems this is not sufficient to correct the problem on
> 6.5-rc4 -- there is no apparent change in behaviour, so maybe there is
> a new, different problem?
Probably. You might check what changed with fw_devlink in that period.
(Offhand, I don't recall many changes)
> I guess I can kick off another bisection to find out when this revert
> stops fixing things...
That always helps.
Rob