Hello,
this patch became commit e2f016cf775129c050d6c79483073423db15c79a and is
contained in v5.11-rc1.
It broke wake-on-lan on my NAS (an ARM machine with an Armada 370 SoC,
armada-370-netgear-rn104.dts). The used phy driver is marvell.c. I only
report it now as I just upgraded that machine from Debian 11 (with
kernel 5.10.x) to Debian 12 (with kernel 6.1.x).
Commenting out phy_disable_interrupts(...) in v6.1.41's phy_shutdown()
fixes the problem for me.
On Sun, Nov 01, 2020 at 02:50:57PM +0200, Ioana Ciornei wrote:
In case of a board which uses a shared IRQ we can easily end up with an
IRQ storm after a forced reboot.
For example, a 'reboot -f' will trigger a call to the .shutdown()
callbacks of all devices. Because phylib does not implement that hook,
the PHY is not quiesced, thus it can very well leave its IRQ enabled.
At the next boot, if that IRQ line is found asserted by the first PHY
driver that uses it, but _before_ the driver that is _actually_ keeping
the shared IRQ asserted is probed, the IRQ is not going to be
acknowledged, thus it will keep being fired preventing the boot process
of the kernel to continue. This is even worse when the second PHY driver
is a module.
To fix this, implement the .shutdown() callback and disable the
interrupts if these are used.
I don't know how this should interact with wake-on-lan, but I would
expect that there is a way to fix this without reintroducing the problem
fixed by this change. However I cannot say if this needs fixing in the
generic phy code or the phy driver. Any hints?
Note that we are still susceptible to IRQ storms if the previous kernel
exited with a panic or if the bootloader left the shared IRQ active, but
there is absolutely nothing we can do about these cases.
I'd say the bootloader could handle that, knowing that for some machines
changing the bootloader isn't an option.