Re: [PATCH net-next v12 09/18] net: ethernet: mtk_eth_soc: Fix link status for none-SGMII modes

From: Russell King (Oracle)
Date: Wed Mar 08 2023 - 11:25:10 EST


On Wed, Mar 08, 2023 at 12:05:19PM +0000, Daniel Golle wrote:
> On Wed, Mar 08, 2023 at 12:44:57PM +0100, Frank Wunderlich wrote:
> > Am 8. März 2023 12:38:31 MEZ schrieb "Russell King (Oracle)" <linux@xxxxxxxxxxxxxxx>:
> > >On Tue, Mar 07, 2023 at 03:54:11PM +0000, Daniel Golle wrote:
> > >> Link partner advertised link modes are not reported by the SerDes
> > >> hardware if not operating in SGMII mode. Hence we cannot use
> > >> phylink_mii_c22_pcs_decode_state() in this case.
> > >> Implement reporting link and an_complete only and use speed according to
> > >> the interface mode.
> > >>
> > >> Fixes: 14a44ab0330d ("net: mtk_eth_soc: partially convert to phylink_pcs")
> > >> Signed-off-by: Daniel Golle <daniel@xxxxxxxxxxxxxx>
> > >
> > >This has been proven to work by Frank Wunderlich last October, so by
> > >making this change, you will be regressing his setup.
> >
> > Hi
> >
> > My tests were done with 1 kind of 1g fibre sfp as i only have these atm...have ordered some 2g5 rj54 ones,but don't have them yet. I'm not sure if they are working with/without sgmii (1000base-X) and if they have builtin phy.
> >
> > Daniel have a lot more of different SFPs and some (especially 2g5) were not working after our pcs change.
>
> Exactly. 1 GBit/s SFPs with built-in PHY and using SGMII are working
> fine before and after conversion to phylink_pcs. 1000Base-X and
> 2500Base-X PHYs and SFPs were broken after this.
>
> The patch about (and the other one you already NACK'ed) fixes those
> codepaths which were simply not used in Frank's setup.

You're replying to "Frank" but I think the "you" you are using there
is addressed to me.

I beg to differ with your assessment over whether the code paths were
used or not. Here's the proof from Frank's testing:

https://lore.kernel.org/all/trinity-4470b00b-771b-466e-9f3a-a3df72758208-1666435920485@3c-app-gmx-bs49/

The values at offset 8 (0x40e041a0) _are_ 1000base-X AN values, not
SGMII. There are many messages in that thread showing valid 1000base-X
AN values in register 8 (that being the local advertisement in the
lower 16 bits, and the partner's advertisement in the upper 16 bits.

The link partner in this case sent 0x40e0, which is:

#define LPA_1000XFULL 0x0020 /* Can do 1000BASE-X full-duplex */
#define LPA_1000XHALF 0x0040 /* Can do 1000BASE-X half-duplex */
#define LPA_1000XPAUSE 0x0080 /* Can do 1000BASE-X pause */
#define LPA_LPACK 0x4000 /* Link partner acked us */

and our advertisement was 0x41a0:

#define ADVERTISE_1000XFULL 0x0020 /* Try for 1000BASE-X full-duplex */
#define ADVERTISE_1000XPAUSE 0x0080 /* Try for 1000BASE-X pause */
#define ADVERTISE_1000XPSE_ASYM 0x0100 /* Try for 1000BASE-X asym pause */
#define ADVERTISE_LPACK 0x4000 /* Ack link partners response */

(bit 14 is managed by the PCS.)

> I have a ballpark of different SFPs and MediaTek boards with different
> PHYs here and tried all of them.
>
> I have no way to tell if the SFPs and PHYs which stopped working after
> the phylink_pcs conversion are sending valid advertisement. The only
> other boards with SFP slots I got here are RealTek-based switches, and
> all I can say is that on an RTL8380 based 1G switch both, the SFP
> modules containing a PHY and operating in SGMII mode as well as the
> ones without a PHY exposed via i2c-mdio and operating in 1000Base-X
> mode are working fine with that switch, with both, stock firmware and
> OpenWrt running on it.
>
> However, even should they not send valid advertisement, they are very
> common parts and they were working before and not after the change to
> phylink_pcs, for the reasons mentioned in the description of this
> patch.

Let me re-iterate in a different way to make this crystal clear: the
fibre SFP is completely transparent with respect to the PCS-to-PCS
link.

The clause 37 autonegotiation is between the PCS at one end and the PCS
at the other end. The fibre SFP is not involved in it. All that fibre
SFPs do is convert the serdes waveform into a varying optical intensity
and back again to a serdes waveform. They do not attempt to interpret
the waveform, but they may shape it via retimer circuitry and adjust
the gain to provide a compliant electrical signal.

So, if the PCS at the far end of the link is sending a zero
advertisement or not sending an advertisement at all, then the LPA
register will contain a zero irrespective of the fibre SFP module
being used.

Another case where it may give a zero value but with a copper SFP
is if the copper SFP is using 1000base-X or SGMII but without
sending an advertisement. As I've already stated, BCM84881 is
known to do this, which is used on some copper SFP modules. It
may be that some copper SFP modules have the PHY setup to use
1000base-X but without any clause 37 advertisement - that would
not surprise me in the least.


I would suggest that if you replaced your mediatek board in your
setup with something else, e.g. an Armada 388 based Clearfog board
which is known to work correctly, used the same fibre SFPs, you'll
find that negotiation would not complete and the link wouldn't
come up. Then if you use the same fibre SFPs that I do at both
ends, you'll find the same problem (because the fibre SFPs make
no difference as far as the in-band AN is concerned.)

I hope this helps.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!