Re: [PATCH] [RFC] net: phy: Fix reboot crash if CONFIG_IP_PNP is not set

From: Ioana Ciornei
Date: Mon Jan 04 2021 - 09:54:36 EST



Hi Geert,

On Mon, Jan 04, 2021 at 01:24:15PM +0100, Geert Uytterhoeven wrote:
> Wolfram reports that his R-Car H2-based Lager board can no longer be
> rebooted in v5.11-rc1, as it crashes with an imprecise external abort.
> The issue can be reproduced on other boards (e.g. Koelsch with R-Car
> M2-W) too, if CONFIG_IP_PNP is disabled:

What kind of PHYs are used on these boards?

>
> Unhandled fault: imprecise external abort (0x1406) at 0x00000000
> pgd = (ptrval)
> [00000000] *pgd=422b6835, *pte=00000000, *ppte=00000000
> Internal error: : 1406 [#1] ARM
> Modules linked in:
> CPU: 0 PID: 1105 Comm: init Tainted: G W 5.10.0-rc1-00402-ge2f016cf7751 #1048
> Hardware name: Generic R-Car Gen2 (Flattened Device Tree)
> PC is at sh_mdio_ctrl+0x44/0x60
> LR is at sh_mmd_ctrl+0x20/0x24
> ...
> Backtrace:
> [<c0451f30>] (sh_mdio_ctrl) from [<c0451fd4>] (sh_mmd_ctrl+0x20/0x24)
> r7:0000001f r6:00000020 r5:00000002 r4:c22a1dc4
> [<c0451fb4>] (sh_mmd_ctrl) from [<c044fc18>] (mdiobb_cmd+0x38/0xa8)
> [<c044fbe0>] (mdiobb_cmd) from [<c044feb8>] (mdiobb_read+0x58/0xdc)
> r9:c229f844 r8:c0c329dc r7:c221e000 r6:00000001 r5:c22a1dc4 r4:00000001
> [<c044fe60>] (mdiobb_read) from [<c044c854>] (__mdiobus_read+0x74/0xe0)
> r7:0000001f r6:00000001 r5:c221e000 r4:c221e000
> [<c044c7e0>] (__mdiobus_read) from [<c044c9d8>] (mdiobus_read+0x40/0x54)
> r7:0000001f r6:00000001 r5:c221e000 r4:c221e458
> [<c044c998>] (mdiobus_read) from [<c044d678>] (phy_read+0x1c/0x20)
> r7:ffffe000 r6:c221e470 r5:00000200 r4:c229f800
> [<c044d65c>] (phy_read) from [<c044d94c>] (kszphy_config_intr+0x44/0x80)
> [<c044d908>] (kszphy_config_intr) from [<c044694c>] (phy_disable_interrupts+0x44/0x50)
> r5:c229f800 r4:c229f800
> [<c0446908>] (phy_disable_interrupts) from [<c0449370>] (phy_shutdown+0x18/0x1c)
> r5:c229f800 r4:c229f804
> [<c0449358>] (phy_shutdown) from [<c040066c>] (device_shutdown+0x168/0x1f8)
> [<c0400504>] (device_shutdown) from [<c013de44>] (kernel_restart_prepare+0x3c/0x48)
> r9:c22d2000 r8:c0100264 r7:c0b0d034 r6:00000000 r5:4321fedc r4:00000000
> [<c013de08>] (kernel_restart_prepare) from [<c013dee0>] (kernel_restart+0x1c/0x60)
> [<c013dec4>] (kernel_restart) from [<c013e1d8>] (__do_sys_reboot+0x168/0x208)
> r5:4321fedc r4:01234567
> [<c013e070>] (__do_sys_reboot) from [<c013e2e8>] (sys_reboot+0x18/0x1c)
> r7:00000058 r6:00000000 r5:00000000 r4:00000000
> [<c013e2d0>] (sys_reboot) from [<c0100060>] (ret_fast_syscall+0x0/0x54)
>
> Calling phy_disable_interrupts() unconditionally means that the PHY
> registers may be accessed while the device is suspended, causing
> undefined behavior, which may crash the system.
>
> Fix this by calling phy_disable_interrupts() only when the PHY has been
> started.
>
> Reported-by: Wolfram Sang <wsa+renesas@xxxxxxxxxxxxxxxxxxxx>
> Fixes: e2f016cf775129c0 ("net: phy: add a shutdown procedure")
> Signed-off-by: Geert Uytterhoeven <geert+renesas@xxxxxxxxx>
> ---
> Marked RFC as I do not know if this change breaks the use case fixed by
> the faulty commit.

I haven't tested it yet but most probably this change would partially
revert the behavior to how things were before adding the shutdown
procedure.

And this is because the interrupts are enabled at phy_connect and not at
phy_start so we would want to disable any PHY interrupts even though the
PHY has not been started yet.

> Alternatively, the device may have to be started
> explicitly first.

Have you actually tried this out and it worked?

I am asking this because I would much rather expect this to be a problem
with how the sh_eth driver behaves if the netdevice did not connect to
the PHY (this is done in .open() alongside the phy_start()) and it
suddently has to interract with it through the mdiobb_ops callbacks.

Also, I just re-tested this use case in which I do not start the
interface and just issue a reboot, and it behaves as expected.

> ---
> drivers/net/phy/phy_device.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 80c2e646c0934311..5985061b00128f8a 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -2962,7 +2962,8 @@ static void phy_shutdown(struct device *dev)
> {
> struct phy_device *phydev = to_phy_device(dev);
>
> - phy_disable_interrupts(phydev);
> + if (phy_is_started(phydev))
> + phy_disable_interrupts(phydev);
> }
>
> /**
> --
> 2.25.1
>