Bug(s) with netconsole (using mv643xx_eth on Kirkwood)

From: Alexander Holler
Date: Thu Apr 03 2014 - 13:58:38 EST


(I've changed the topic and removed stable@ from the cc-list to reflect the current status)

(Long mail, but hopefully a good problem description)

I already knew about problems with netconsole and mv643xx_eth since
4 years, but didn't care a lot because everything else worked flawless,
I even had forgotten that I've enabled netconsole. (But the bugs I've
experienced 4 years ago, seeing no msgs remotely from netconsole seem to
have disappeared).

But now, using 3.14, I hit a bug which killed the ethernet with a 100%
success rate, and, after digging a bit, I've come to the conclusion
that netconsole (together with a maybe broken initialization of the PHY) is the source of the problem.

The kernel is 3.14 (mainline) with one reverted patch (7cd1463). This patch changed the initialization of the PHY such, that the ethernet dies 100% reproducible on a Kirkwood 88F6281 based machine. Reverting that patch gives me a oneline bug-enabler:

------
diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c b/drivers/net/ethernet/marvell/mv643xx_eth.c
index e891b48..246f065 100644
--- a/drivers/net/ethernet/marvell/mv643xx_eth.c
+++ b/drivers/net/ethernet/marvell/mv643xx_eth.c
@@ -2095,7 +2095,8 @@ static void port_start(struct mv643xx_eth_private *mp)
struct ethtool_cmd cmd;

mv643xx_eth_get_settings(mp->dev, &cmd);
- phy_reset(mp);
+ //phy_reset(mp);
+ phy_init_hw(mp->phy);
mv643xx_eth_set_settings(mp->dev, &cmd);
phy_start(mp->phy);
}
------

First I describe what happens at boot:

- Bootloader (U-Boot) enables (somehow) the network such that is usable as a console for the bootloader,
- Kernel is loaded and started with netconsole enabled through the kernel command line (netconsole=...),
- eth driver probe => PHY reset
- netconsole initializes the network (netpoll_setup) => PHY reset,
- userland starts,
- userland configures network (ip addr add fixedIP ..., a hack used for a very early ntpdate before the rootfs becomes rw), I'm not sure if that's end up again in a PHY reset.
- userland starts network by using dhcpcd => PHY reset

Now several use cases:

Case 1:
Using plain 3.14 the last step fails with no carrier, because the PHY ends up in a never ending reset (BMCR_RESET always set) in m88e1111_config_init() called by phy_init_hw() in port_start() in mv643xx_eth.

Case 2:
Without enabling netconsole through the kernel command line, I see no problems.

Case 3:
If I enable the old phy_reset() in mv643xx_eth, I see no problems.

Case 4:
If I reduce the time the newly used reset in phy_init_hw() spends in
calling mdelay(500) twice to some milliseconds m88e1111_config_init by polling for a cleared BMCR_RESET, I see no problems.

Case 5:
If I disable the initialization of the network in the bootloader, netconsole even worked 4 years ago. But I haven't looked into that case further, because I always want to use the network as a console for the bootloader.


Current assumption:

So, after having spend too much time into diagnosing the above stuff (so I was right in ignoring the non-working netconsole for 4 years), I've comed to the conclusion that some synchronization between netconsole/netpoll and the normal network stack or mv643xx_eth is missing. That would explain why the PHY ends up in a never ending reset and why this only happens reproducible if the PHY reset needs a whole second by using mdelay(500) twice (which likely is used to switch
the task to netconsole inbetween). It might be a hw problem too (I haven't read the datasheet or looked for any erratas).

I hope everyone who missed some more information is happy now, otherwise
I (again) wasted time to type a problem description (not to speak about the already spent time trying to diagnose the problem)

So go on and try to take the almost low hanging fruit. I'm not sure if I
will spend more time on that topic as I already have a working patch/workaround and the discussion has become a bit tiresome. Sorry.

Regards,

Alexander Holler
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/