Re: [PATCH net-next v12 08/18] net: ethernet: mtk_eth_soc: fix 1000Base-X and 2500Base-X modes
From: Russell King (Oracle)
Date: Wed Mar 08 2023 - 08:16:00 EST
On Wed, Mar 08, 2023 at 12:53:13PM +0000, Daniel Golle wrote:
> On Wed, Mar 08, 2023 at 12:41:43PM +0000, Russell King (Oracle) wrote:
> > On Wed, Mar 08, 2023 at 12:11:48PM +0000, Daniel Golle wrote:
> > > On Wed, Mar 08, 2023 at 11:35:40AM +0000, Russell King (Oracle) wrote:
> > > > On Tue, Mar 07, 2023 at 03:53:58PM +0000, Daniel Golle wrote:
> > > > > After conversion to phylink_pcs the 1000Base-X and 2500Base-X modes
> > > > > would work only after `ethtool -s eth1 autoneg off`.
> > > > > As ethtool autoneg and the ETHTOOL_LINK_MODE_Autoneg_BIT is supposed
> > > > > to control auto-negotiation on the external interface it doesn't make
> > > > > much sense to use it to control on-board SGMII auto-negotiation between
> > > > > MAC and PHY.
> > > > > Set correct values to really only enable SGMII auto-negotiation when
> > > > > actually operating in SGMII mode. For 1000Base-X and 2500Base-X mode,
> > > > > enable remote-fault detection only if in-band-status is enabled.
> > > > > This fixes using 1000Base-X and 2500Base-X SFPs on the BananaPi R3
> > > > > board and also makes it possible to use interface-mode-switching PHYs
> > > > > operating in either SGMII mode for 10M/100M/1000M or in 2500Base-X for
> > > > > 2500M mode on other boards.
> > > > >
> > > > > Fixes: 14a44ab0330d ("net: mtk_eth_soc: partially convert to phylink_pcs")
> > > > > Signed-off-by: Daniel Golle <daniel@xxxxxxxxxxxxxx>
> > > >
> > > > NAK.
> > > >
> > > > There are PHYs out there which operate in SGMII mode but do not
> > > > exchange the SGMII 16-bit configuration word. The code implemented
> > > > here by me was explicitly to allow such a configuration to work,
> > > > which is defined as:
> > > >
> > > > SGMII *without* mode == inband
> > > >
> > > > An example of this is the Broadcom 84881 PHY which can be found on
> > > > SFP modules.
> > >
> > > I also have multiple such 1000Base-T SFP modules here (finisar, AJYA),
> > > and this change doesn't touch the codepaths relevant for those. They
> > > are operating in SGMII mode, they have always been working fine.
> > >
> > > What I'm trying to fix here is 1000Base-X and 2500Base-X mode which
> > > has been broken by introducing ETHTOOL_LINK_MODE_Autoneg_BIT as the
> > > deciding factor for in-band AN here.
> >
> > ... which is correct.
> >
> > > Can you explain why ETHTOOL_LINK_MODE_Autoneg_BIT was used there in
> > > first place? Is my understanding of this bit controlling autoneg on the
> > > *external* interface rather than on the *system-side* interface wrong?
> >
> > Think about what 1000BASE-X is for. It's not really for internal links,
> > it's intended by IEEE 802.3 to be the 1G *media* side protocol for
> > 1000BASE-SX, 1000BASE-LX, 1000BASE-CX etc links.
> >
> > Therefore, when being used in that case, one may wish to disable
> > autoneg over the fibre link. Hence, turning off autoneg via ethtool
> > *should* turn off autoneg over the fibre link. So, using
> > ETHTOOL_LINK_MODE_Autoneg_BIT to gate 802.3z autonegotiation the
> > correct thing to do.
> >
> > If we have a PHY using 1000BASE-X, then it is at odds with the
> > primary purpose of this protocol, especially with it comes to AN.
> > This is why phylink used to refuse to accept PHYs when using 802.3z
> > mode, but Marek wanted this to work, so relaxed the checks
> > preventing such a setup working.
>
> Sadly 2500Base-X is very commonly used to connect 2500Base-T-capable
> PHYs or SFP modules. I also got an ATS branded 1000M/100M/10M copper
> SFP module which uses 1000Base-X as system-side interface, independently
> of the speed of the link partner on the TP interface.
> All of them do not work with inband-AN enabled and a link only comes
> up after `ethtool -s eth1 autoneg off` in the current implementation,
> while previously they were working just fine.
>
> I understand that there isn't really a good solution for 1000Base-X as
> thanks to you I now understand that SerDes autoneg just transparently
> ends up being autoneg on a fiber link.
>
> 2500Base-X, however, is hardly used for fiber links, but rather mostly
> for 2500Base-T PHYs and SFP module as well as xPON SFPs. Maybe we could
> at least have in-band AN disabled by default for those to get them
> working without requiring the user to carry out ethtool settings?
We could _possibly_ make 2500base-X ignore the autoneg bit, but in
order to do that I would want to make other changes, because this
is getting absolutely stupid to have these decisions being made in
each and every PCS - and have each PCS author implementing different
decision making in their PCS driver.
The problem that gives is it makes phylink maintenance in hard,
because it becomes impossible to predict what the effect of any
change is.
It also means that plugging the same SFP module into different
hardware will give different results (maybe it works, maybe it
doesn't.)
So, what I would want to do is to move the decision about whether
the PCS should enable in-band into the phylink core code instead
of these random decisions being made in each PCS.
At that point, we can then make the decision about whether the
ethtool autoneg bit should affect whether the PCS uses inband
depending on whether the PCS is effectively the media facing
entity, or whether there is a PHY attached - and if there is a PHY
attached, ask the PHY whether it will be using in-band or not.
This also would give a way to ensure that all PCS adopt the same
behaviour.
Does that sound a reasonable approach?
Strangely, I already have some patches along those lines in my
net-queue branch. See:
net: phylink: add helpers for decoding mode
net: use phylink_mode_*() helpers
net: phylink: split PCS in-band from inband mode
It's nowhere near finished though, it was just an idea back in
2021 when the problem of getting rid of differing PCS behaviours
was on my mind.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!