Re: PROBLEM: Broken or delayed ethernet on Xilinx ZCU104 since 5.18 (regression)

From: Linux regression tracking (Thorsten Leemhuis)
Date: Fri Aug 04 2023 - 11:46:01 EST


[adding Robert Hancock (the author of the likely culprit) to the list of
recipients as well as the network maintainers]

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 04.08.23 17:26, Nick Bowler wrote:
> Hi,
>
> With recent kernels (5.18 and newer) the ethernet is all wonky on my
> ZCU104 board.
>
> There is some behaviour inconsistency between kernel versions identified
> during bisection, so maybe there is more than one issue with the ethernet?
>
> 6.5-rc4: after 10 seconds, the following message is printed:
>
> [ 10.761808] platform ff0e0000.ethernet: deferred probe pending
>
> but the network device seemingly never appears (I waited about a minute).
>
> 6.1 and 6.4: after 10 seconds, the device suddenly appears and starts
> working (but this is way too late).
>
> 5.18: the device never appears and no unusual messages are printed
> (I waited ten minutes).
>
> With 5.17 and earlier versions, the eth0 device appears without any delay.
>
> Unfortunately, as bisection closed on the problematic section, all the
> built kernels became untestable as they appear to crash during early
> boot. Nevertheless, I manually selected a commit that sounded relevant:
>
> commit e461bd6f43f4e568f7436a8b6bc21c4ce6914c36
> Author: Robert Hancock <robert.hancock@xxxxxxxxxx>
> Date: Thu Jan 27 10:37:36 2022 -0600
>
> arm64: dts: zynqmp: Added GEM reset definitions
>
> Reverting this fixes the problem on 5.18. Reverting this fixes the
> problem on 6.1. Reverting this fixes the problem on 6.4. In all of
> these versions, with this change reverted, the network device appears
> without delay.
>
> Unfortunately, it seems this is not sufficient to correct the problem on
> 6.5-rc4 -- there is no apparent change in behaviour, so maybe there is
> a new, different problem?
>
> I guess I can kick off another bisection to find out when this revert
> stops fixing things...
>
> Let me know if you need any more info!

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced e461bd6f43f4e5
#regzbot title net/arm64: dts: Broken or delayed ethernet on Xilinx ZCU104
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.