Re: [PATCH net-next] net: phy: populate host_interfaces when attaching PHY

From: Russell King (Oracle)
Date: Thu Oct 10 2024 - 11:04:26 EST


On Thu, Oct 10, 2024 at 03:07:30PM +0100, Daniel Golle wrote:
> > Note that this interface switching mechanism was introduced early on
> > with the 88x3310 PHY, before any documentation on it was available,
> > and all that was known at the time is that the PHY switched between
> > different MAC facing interface modes depending on the negotiated
> > speed. It needed to be supported, and what we have came out of that.
> > Legacy is important, due to the "no regressions" rule that we have
> > in kernel development - we can't go breaking already working setups.
>
> What about marking Ethernet drivers which are capable of interface
> mode switching? Right now there isn't one "correct" thing to do for
> PHY drivers, which is bad, as people may look into any driver as
> a reference for the development of future drivers.
>
> So why not introduce a MAC capability bit? Even dedicated for switching
> between two specific modes (SGMII and 2500Base-X), to avoid any
> ambiguitities or unnecessary feature-creep.

They already have a perfectly good way to do this today. They can look
at DT and set just one mode in the ->supported_interfaces bitmap if
they don't support interface switching! The MAC drivers are already
responsible for parsing the phy-mode from firmware, and it's that
driver that also knows whether it knows if it supports interface
switching or not. So I don't see any need for additional capability
bits.

> Typically switching between modes within the same PCS is more easy to
> support, switching from 10:8 to 66:64 coding can be more involved,
> and require resets which affect other links, so that's something
> worth avoiding unless we really need it (in case the users inserts
> a different SFP module it would be really needed, in case the external
> link goes from 5 Gbit/s to 2.5 Gbit/s it might be worth avoiding
> having to switch from 5GBase-R to 2500Base-X)
>
> It wouldn't be the first time something like that would be done, however
> I have full understanding that any reminder of that whole
> legacy_pre_march2020 episode may trigger post-traumatic stress in netdev
> maintainers.

Yes, and it's still continuing to cause problems today. I've just
replied to someone working on the macb driver who was proposing to
make use state->speed in his mac_config() method, despite modern
phylink always passing SPEED_UNKNOWN for that. I guess if I didn't
reply, he'd find out the hard way (which is why I made the change
to phylink_mac_config() to cause testing failures if people try
that.)

> > > The SFP case is clear, it's using host_interfaces. But in the built-in
> > > case, as of today, it always ends up in fixed interface mode with
> > > rate matching.
> >
> > "always rate matching" no. Fixed interface mode, yes. If
> > rtl822xb_config_init() sees phydev->interface is set to SGMII, then
> > it uses 2500BASEX_SGMII mode without rate matching - and the
> > advertisement will be limited to 1G and below which will effectively
> > prevent the PHY using 2.5G mode - which is fine in that case.
>
> While that case might be relevant for SFP modules, in pracise, why would
> anyone use a more expensive 2.5Gbit/s PHY on a board which only supports
> up to 1Gbit/s -- that doesn't happen in the real world, because 1Gbit/s
> SGMII PHYs are ubiquitous and much cheaper than faster ones.

I'm merely stating the logic that is there today.

> > > Let me summarize:
> > > - We can't just assume that every MAC which supports 2500Base-X also
> > > supports SGMII. It might support only 2500Base-X, or 2500Base-X and
> > > 1000Base-X, or the driver might not support switching between
> > > interface modes.
> > > - We can't rely on the PHY being pre-configured by the bootloader to
> > > either rate maching or interface-switching mode.
> > > - There are no strapping options for this, the only thing which is
> > > configured by strapping is the address.
> >
> > Right, so the only _safe_ thing to do is to assume that:
> >
> > 1. On existing PHY drivers which do not do any kind of interface
> > switching, retrofitting interface switching of any kind is unsafe.
>
> It's important to note that for the RealTek driver specifically we
> have done that in OpenWrt from day 1 -- simply because the rate-matching
> performed, especially by early versions of the PHY, is too bad. So we
> always made the driver switch interface to SGMII in case of link speeds
> slower than 2500M.
> Now, with the backport of upstream commits replacing our downstream
> patches, users started to complain that the issue of bad PHY performance
> in 1 Gbit/s mode has returned, which is the whole reason why I started
> to work on this issue.
>
> I understand, however, there may of course be other users of those
> RealTek 2.5G PHYs, even Rockchip with stmmac maybe, and that would
> break if we assume the MAC can support switching between 2500Base-X
> and SGMII, so users of those boards will have to live with the bad
> performance of the rate-matching performed by the PHY unless someone
> fixes the stmmac driver...

If OpenWRT's switching predates July 2017, then maybe they should've
been more pro-active at getting their patches upstream?

Unfortunately, in July 2017, there was nothing in mainline supporting
2500base*, and nothing doing any interface switching until I added
the Marvell 88x3310 driver which was where _I_ proposed interface
switching being added to phylib.

> > 2. On brand new PHY drivers which have no prior history, there can
> > not be any regressions, so implementing interface switching from
> > the very start is safe.
> >
> > The only way out of this is by inventing something new for existing
> > drivers that says "you can adopt a different behaviour" and that
> > must be a per-platform switch - in other words, coming from the
> > platform's firmware definition in some way.
>
> Why would it not just be the MAC driver which indicates that it can
> support switching to lower-speed interface modes it also supports?
> Do you really believe there are boards which are electrically
> unfit for performing SGMII on the traces intended for 2500Base-X?

For a single-lane serdes, what you say is true. However, it is not
universal across all interface modes.

As I stated in a previous discussion, if we have e.g. four lanes of
XAUI between a PHY and MAC, and both ends support XAUI, RXAUI,
10GBASE-R, 5GBASE-R, 2500BASE-X, and SGMII, it does not necessarily
follow that the platform can support 10GBASE-R and 5GBASE-R over
a single lane because the signalling rate is so much higher. Just
because the overall speed is the same or lower does not automatically
mean that it can be used.

> If by 'firmware' you mean 'device tree' then we are back on square
> one, and we would need several phy-connection-type aka. phy-mode
> listed there.
>
> After having read all the threads from 2021 you have provided links for,
> I believe that maybe an additional property which lists the interface
> modes to be used *optionally* and in addition to the primary (ie.
> fastest) mode stated in phy-mode or phy-connection-type could be a way
> out of this. It would still end up being potentially a longer list of
> interface modes, but reality is complex! Looking at other corners of DT
> it would still be rather simple and human readable (in contrast: take a
> look at inhumanly long arrays of gpio-line-names where even the order
> matters, for example...)
>
> Yet, it would still be a partial violation of Device Tree rules because
> we are (also) describing driver capabilities there then. What if, let's
> say one day stmmac *will* support interface mode switching? Should we
> update each and every board's device tree?

Well, I guess we need people that adopt phylink to actually implement
it properly rather than just slapping it into their driver in a way
that "works for them". :)

This is a battle that I've been trying for years with, but programmers
are lazy individuals who don't want to (a) read API documentation, (b)
implement things properly.

Anyway, I'm out of time right now to continue replying to this
conversation (it's taken over an hour so far to put what I've said
together, and I now have a meeting... so reached the end of what I can
do right now... so close to the end of your email too! Alas...

>
> Hence, unless there is a really good reason to believe that a board which
> works fine with 2500Base-X would not work equally well with SGMII, given
> that both the MAC (-driver) and PHY (-driver) support both modes, I don't
> see a need to burden firmware with having to list additional phy-modes.
>
> Of course, I'm always only talking about allowing switching to slower
> interface modes, and always only about modes which use a single pair
> differential lanes (ie. sgmii, 1000base-x, 2500base-x, 5gbase-r,
> 10gbase-r, ...).
>
> > > > > Afaik, practially all rate-matching PHYs which do support half-duplex
> > > > > modes on the TP interface can perform duplex-matching as well.
> > > >
> > > > So we should remove that restriction!
> > >
> > > Absolutely. That will solve at least half of the problem. It still
> > > leaves us with a SerDes clock running 2.5x faster than it would have to,
> > > PHY and MAC consuming more energy than they would have to and TX
> > > performance being slightly worse (visible with iperf3 --bidir at least
> > > with some PHYs). But at least the link would come up.
> >
> > It also means we move forward!
>
> ACK. Patch posted.
>

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!