Re: [PATCH] ixgbe: Manual AN-37 for troublesome link partners for X550 SFI

From: Andrew Lunn
Date: Mon Sep 09 2024 - 15:36:18 EST


> This was originally worked out by Doug Boom at Intel. It had to do
> with autonegotiation not being the part of the SFP optics when the
> Denverton X550 Si was released and was thus not POR for DNV. The
> Juniper switches however won't exit their AN sequence unless an AN37
> transaction is seen.

I wounder what 802.3 says about this. I suspect the Juniper switch is
within the standard here, and the x550 is broken.

> Other switch vendors recover gracefully when the right encoding is
> discovered, not using AN37 transactions, but not Juniper.

We have seen similar things in the Linux core PHY handling, but mostly
around 2500BaseX MAC and PHY drivers. A lot of vendors implement what
they call over clocked SGMII, rather than 2500BaseX. But SGMII
signalling makes no sense when overclocked to 2.5GHz, so they just
disable it, leaving no signalling at all. Some 25000BaseX PHYs handle
this, they gracefully fall back to sensible defaults when they
discover they are connected to a broken MAC. Others need telling they
are connected to a broken MAC which does not perform signalling. But
it is easier for a MAC-PHY relationship, everything is on one board,
we know all the details, and can work around the issues.

> Since DNV doesn't do AN37 in SFP auto mode, there's an endless loop.
> (Technically, the switches *could* be updated to new firmware that
> should have this capability, but apparently a logistical issue for
> at least one of our customers.)

I would say that is the wrong solution, i don't think the switch is
doing anything wrong. But the devil is in the details, check 802.3.

> Going back through my emails, Doug did mention that it would possibly cause issues with other switches, but it wasn't anything we, or (until just recently) anyone else had observed. A quote from Doug:
>
> "that AN37 fix pretty much only works with the Juniper switches, and can cause problems with other switches."

LOS from the from the SFP cage will tell you there is something on the
other end of the link. It is not a particularly reliable signal, since
it just means there is light. Is there any indication the link is not
usable? You could wait 10 seconds after LOS is inactive, and if there
is no usable link kick off the workaround. If after 10 seconds the
link is still not usable, turn the workaround off again. Flip flop
every 10 seconds.

Hopefully the initial 10 seconds delay means you won't upset switches
which currently work, and after 10 seconds, you gain a link to
switches that really do expect AN37.

Andrew