Re: PROBLEM: Broken or delayed ethernet on Xilinx ZCU104 since 5.18 (regression)

From: Rob Herring
Date: Fri Aug 04 2023 - 13:03:41 EST


On Fri, Aug 4, 2023 at 10:54 AM Nick Bowler <nbowler@xxxxxxxxxx> wrote:
>
> On 2023-08-04, Nick Bowler <nbowler@xxxxxxxxxx> wrote:
> > On 04/08/2023, Rob Herring <robh@xxxxxxxxxx> wrote:
> >> On Fri, Aug 4, 2023 at 9:27 AM Nick Bowler <nbowler@xxxxxxxxxx> wrote:
> >>> commit e461bd6f43f4e568f7436a8b6bc21c4ce6914c36
> >>> Author: Robert Hancock <robert.hancock@xxxxxxxxxx>
> >>> Date: Thu Jan 27 10:37:36 2022 -0600
> >>>
> >>> arm64: dts: zynqmp: Added GEM reset definitions
> >>>
> >>> Reverting this fixes the problem on 5.18. Reverting this fixes the
> >>> problem on 6.1. Reverting this fixes the problem on 6.4. In all of
> >>> these versions, with this change reverted, the network device appears
> >>> without delay.
> >>
> >> With the above change, the kernel is going to be waiting for the reset
> >> driver which either didn't exist or wasn't enabled in your config
> >> (maybe kconfig needs to be tweaked to enable it automatically).
> >
> > The dts defines a reset-controller node with
> >
> > compatible = "xlnx,zynqmp-reset"
> >
> > As far as I can see, this is supposed to be handled by the code in
> > drivers/reset/zynqmp-reset.c driver, it is enabled by CONFIG_ARCH_ZYNQMP,
> > and I have that set to "y", and it appears to be getting compiled in (that
> > is, there is a drivers/reset/zynqmp-reset.o file in the build directory).
>
> Oh, I get it, to include this driver I need to also enable:
>
> CONFIG_RESET_CONTROLLER=y
>
> Setting this fixes 6.4. Perhaps CONFIG_ARCH_ZYNQMP should select it?

Maybe. Do other platforms do that?

> I guess the reset-zynqmp.o file that was in my build directory must
> have been leftover garbage from a long time ago.
>
> However, even with this option enabled, 6.5-rc4 remains broken (no
> change in behaviour wrt. the network device). I will bisect this
> now.

It would be good to know why the deferred probe timeout doesn't work.
If you disable modules, the kernel shouldn't wait past late_initcall.
Though this functionality keeps getting tweaked, so I may be off on
the current behavior.

Rob