On Tue, Oct 05, 2021 at 12:45:28PM -0400, Sean Anderson wrote:
On 10/5/21 6:33 AM, Russell King (Oracle) wrote:
> On Mon, Oct 04, 2021 at 03:15:27PM -0400, Sean Anderson wrote:
> > Some modules have something at SFP_PHY_ADDR which isn't a PHY. If we try to
> > probe it, we might attach genphy anyway if addresses 2 and 3 return
> > something other than all 1s. To avoid this, add a quirk for these modules
> > so that we do not probe their PHY.
> >
> > The particular module in this case is a Finisar SFP-GB-GE-T. This module is
> > also worked around in xgbe_phy_finisar_phy_quirks() by setting the support
> > manually. However, I do not believe that it has a PHY in the first place:
> >
> > $ i2cdump -y -r 0-31 $BUS 0x56 w
> > 0,8 1,9 2,a 3,b 4,c 5,d 6,e 7,f
> > 00: ff01 ff01 ff01 c20c 010c 01c0 0f00 0120
> > 08: fc48 000e ff78 0000 0000 0000 0000 00f0
> > 10: 7800 00bc 0000 401c 680c 0300 0000 0000
> > 18: ff41 0000 0a00 8890 0000 0000 0000 0000
>
> Actually, I think that is a PHY. It's byteswapped (which is normal using
> i2cdump in this way).The real contents of the registers are:
>
> 00: 01ff 01ff 01ff 0cc2 0c01 c001 000f 2001
> 08: 48fc 0e00 78ff 0000 0000 0000 0000 f000
> 10: 0078 bc00 0000 1c40 0c68 0003 0000 0000
> 18: 41ff 0000 000a 9088 0000 0000 0000 0000
Ah, thanks for catching this.
> It's advertising pause + asym pause, 1000BASE-T FD, link partner is also
> advertising 1000BASE-T FD but no pause abilities.
>
> When comparing this with a Marvell 88e1111:
>
> 00: 1140 7949 0141 0cc2 05e1 0000 0004 2001
> 08: 0000 0e00 4000 0000 0000 0000 0000 f000
> 10: 0078 8100 0000 0040 0568 0000 0000 0000
> 18: 4100 0000 0002 8084 0000 0000 0000 0000
>
> It looks remarkably similar. However, The first few reads seem to be
> corrupted with 0x01ff. It may be that the module is slow to allow the
> PHY to start responding - we've had similar with Champion One SFPs.
Do you have an an example of how to work around this? Even reading one
register at a time I still get the bogus 0x01ff. Reading bytewise, a
reasonable-looking upper byte is returned every other read, but the
lower byte is 0xff every time.
I think the Champion One modules just don't respond to the I2C
transactions, so we keep retrying for a while. We try every
50ms for 12 retries, which seems to be long enough for their
modules.
> It looks like it's a Marvell 88e1111. The register at 0x11 is the
> Marvell status register, and 0xbc00 indicates 1000Mbit, FD, AN
> resolved, link up which agrees with what's in the various other
> registers.
That matches some supplemental info on the manufacturer's website
(which was frustratingly not associated with the model number of
this particular module).
The interesting thing is, many modules use 88e1111, which is about
the only PHY that I'm aware that supports I2C access mode natively.
So, it's really surprising that you're getting corrupted data,
unless...
There's been a history of using too strong pull-ups on the SFP I2C
lines. The SFP MSA gives a minimum value of the resistors (4.7k).
SFP+ lowers the minimum value and raises the maximum clock frequency.
Some SFP modules are unable to drive the I2C bus low against the
lower resistances resulting in corrupted data (or worse, it can
corrupt the EEPROMs.)
Other problems on some platforms have been with I2C level shifters
locking up, but that doesn't look like what's happening here - they
lockup at logic low not logic high. Even so-called "impossible to
lockup" level shifters have locked up despite their manufacturer
stating that it is impossible.
Is it always the same addresses?
What if you read from a different offset?
What if you re-read after it seems to have cleared?