Re: [PATCH net-next v12 08/18] net: ethernet: mtk_eth_soc: fix 1000Base-X and 2500Base-X modes

From: Russell King (Oracle)
Date: Wed Mar 08 2023 - 10:01:35 EST


On Wed, Mar 08, 2023 at 02:32:40PM +0000, Daniel Golle wrote:
> In general it sound reasonable. We may need more SFP qurik bits to
> indicate presence of a PHY on SFP modules which do not expose that
> PHY via i2c-mdio or otherwise let the host know about it's presence.

That's a whole load of fun - some modules where the PHY is inaccessible
will be using 1000base-X, others will be using SGMII. So yes, its
likely that we may need quirks for these. We don't have quirks yet
because you're the first to suggest there's a problem.

> For my TP-LINK TL-SM410U 2500Base-T SFP this unfortunately seems to
> be the case, and I assume it's actually like that for most
> 2500Base-T as well as xPON SFPs... (xPON SFPs are usually managed
> via high-level protocols, even Web-UI is common there. They don't
> tell you much about them via I2C, I suppose to get them to work in
> as many SFP host devices as possible without any software changes).

xPON SFPs are a whole different ball game. For some, they auto-detect
while booting and try 2500base-X or 1000base-X to see which will sync
and if not they try the other. Other xPON SFPs run their host interface
at a speed determined by the configuration set by the remote end. Other
xPON SFPs may do something entirely different.

In many cases, their EEPROM is a full of errors - such as advertising
that they're 1.2 or 1.3 Gbd while operating in 2500base-X mode.

They do weird stuff with their status pins as well, for example, some
use the RX_LOS pin as a uart - which is a problem if it's e.g. tied
from the cage to a switch that uses the pin to gate the link-up
indication without any software control of that!

With xPON SFPs, it's just a total minefield, which lots of SFF MSA
violations all over the place. They're essentially a law to themselves
(this is exactly why we have the quirks infrastructure.)

> FYI:
> TP-LINK TL-SM410U 2500Base-T module:
>
> sfp EE: 00000000: 03 04 07 00 00 00 00 00 00 40 00 01 1f 00 00 00 .........@......
> sfp EE: 00000010: 00 00 00 00 54 50 2d 4c 49 4e 4b 20 20 20 20 20 ....TP-LINK
> sfp EE: 00000020: 20 20 20 20 00 30 b5 c2 54 4c 2d 53 4d 34 31 30 .0..TL-SM410
> sfp EE: 00000030: 55 20 20 20 20 20 20 20 32 2e 30 20 00 00 00 1b U 2.0 ....
> sfp EE: 00000040: 00 08 01 00 80 ff ff ff 40 3d f0 0d c0 ff ff ff ........@=......
> sfp EE: 00000050: c8 39 7a 08 c0 ff ff ff 50 3d f0 0d c0 ff ff ff .9z.....P=......
> sfp sfp2: module TP-LINK TL-SM410U rev 2.0 sn 12260M4001782 dc 220622

I'm guessing this is a module with a checksum problem...

> And this is the ATS SFP-GE-T 10/100/1000M copper module doing
> rate-adaptation to 1000Base-X:
>
> sfp sfp1: EEPROM extended structure checksum failure: 0xb0 != 0xaf

Given how close that is, it looks like they used the wrong algorithm.

> sfp EE: 00000000: 03 04 07 00 00 00 02 12 00 01 01 01 0c 00 03 00 ................
> sfp EE: 00000010: 00 00 00 00 4f 45 4d 20 20 20 20 20 20 20 20 20 ....OEM
> sfp EE: 00000020: 20 20 20 20 00 00 90 65 53 46 50 2d 47 45 2d 54 ...eSFP-GE-T
> sfp EE: 00000030: 20 20 20 20 00 00 00 00 43 20 20 20 00 00 00 f0 ....C ....
> sfp EE: 00000040: 00 12 00 00 32 31 30 37 31 30 41 30 30 31 32 37 ....210710A00127
> sfp EE: 00000050: 33 39 00 00 32 31 30 37 31 30 20 20 60 00 01 af 39..210710 `...
> sfp sfp1: module OEM SFP-GE-T rev C sn dc

Welcome to the wonderful world of horribly broken SFPs.

Do we know what form of rate adaption this module needs on the
transmit path? Does it require the host to pace itself to the media
speed (which I suspect will be unreadable if the PHY isn't accessible)
or will it send pause frames?

It would be nice to add these to my database - please send me the
output of ethtool -m $iface raw on > foo.bin for each module.

> That one already needs quirks to even work at all as TX-FAULT is not
> reported properly by the module, see
>
> https://github.com/dangowrt/linux/commit/2c694bd494583f08858fabca97cfdc79de8ba089

I'm guessing that's not on a kernel version that has:

73472c830eae net: sfp: add support for HALNy GPON SFP
5029be761161 net: sfp: move Huawei MA5671A fixup
275416754e9a net: sfp: move Alcatel Lucent 3FE46541AA fixup
23571c7b9643 net: sfp: move quirk handling into sfp.c
8475c4b70b04 net: sfp: re-implement soft state polling setup

which reworks how we deal with the soft/hard state signals.

I think the problem space is growing, and I fear that if we try to
address all these issues in one go, we're going to end up with way
too much to deal with in one go (which means poor reviews etc.)

Can we try to concentrate on fixing one problem at a time, rather
than throwing a whole load of problems into the mix?

Thanks.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!