RE: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
From: Don Bollinger
Date: Fri Mar 26 2021 - 14:44:22 EST
> > > Our experience is that a number of SFPs are broken, they don't
> > > follow the standard. Some you cannot perform more than 16 bytes
> > > reads without them locking up. Others will perform a 16 byte read,
> > > but only give you one
> > useful
> > > byte of data. So you have to read enough of the EEPROM a byte at a
> > > time to get the vendor and product strings in order to determine
> > > what quirks need to be applied. optoe has nothing like this. Either
> > > you don't care and only support well behaved SFPs, or you have the
> > > quirk handling in user space,
> > in
> > > the various vendor code blobs, repeated again and again. To make
> > > optoe generically usable, you are going to have to push the quirk
> > > handling into optoe. The brokenness should be hidden from userspace.
> >
> > Interesting. I would throw away such devices. That's why switch
> > vendors publish supported parts lists.
> >
> > Can you point me to the code that is handling those quirks? Since I
> > haven't seen those problems, I don't know what they are and how to
> address them.
>
> Take a look in drivers/net/phy/sfp.c
Thanks for the pointers, I appreciate the chance to understand exactly what
quirks are under discussion.
It turns out that those quirks are not relevant to my patch. I don't mean
you are wrong, or that they don't have to be handled for your use cases,
just that my patch does not need to deal with them.
If optoe were adopted into your framework, it would replace the routines
sfp_i2c_read() and sfp_i2c_write(). By the time these two routines are
called, all of the quirk handling has already been applied (there is no
quirk handling in these routines today). The data requests have been broken
down as necessary into short reads or targeted reads for each device. Those
calls are then sent to the module EEPROM with no further need for quirk
handling. So, optoe does not need to handle quirks to fit into your code.
The quirks are handled, without optoe being involved.
In my community, the SFP/QSFP/CMIS devices (32 to 128 of them per switch)
often cost more than the switch itself. Consumers (both vendors and
customers) extensively test these devices to ensure correct and reliable
operation. Then they buy them literally by the millions. Quirks lead to
quick rejection in favor of reliable parts from reliable vendors. In this
environment, for completely different reasons, optoe does not need to handle
quirks.
>
> commit f0b4f847673299577c29b71d3f3acd3c313d81b7
> Author: Pali Rohár <pali@xxxxxxxxxx>
> Date: Mon Jan 25 16:02:28 2021 +0100
>
> net: sfp: add mode quirk for GPON module Ubiquiti U-Fiber Instant
>
> The Ubiquiti U-Fiber Instant SFP GPON module has nonsensical
information
> stored in its EEPROM. It claims to support all transceiver types
including
> 10G Ethernet. Clear all claimed modes and set only 1000baseX_Full,
which
> is
> the only one supported.
>
> This module has also phys_id set to SFF, and the SFP subsystem
currently
> does not allow to use SFP modules detected as SFFs. Add exception for
this
> module so it can be detected as supported.
In my community, all interpretation of the SFP/QSFP/CMIS content is done in
user space. So, these issues don't affect optoe. I know you find that
distasteful, but it is what my community wants and needs.
>
> This change finally allows to detect and use SFP GPON module Ubiquiti
> U-Fiber Instant on Linux system.
>
> EEPROM content of this SFP module is (where XX is serial number):
>
> 00: 02 04 0b ff ff ff ff ff ff ff ff 03 0c 00 14 c8
???........??.??
> 10: 00 00 00 00 55 42 4e 54 20 20 20 20 20 20 20 20 ....UBNT
> 20: 20 20 20 20 00 18 e8 29 55 46 2d 49 4e 53 54 41
.??)UF-INSTA
> 30: 4e 54 20 20 20 20 20 20 34 20 20 20 05 1e 00 36 NT 4
??.6
> 40: 00 06 00 00 55 42 4e 54 XX XX XX XX XX XX XX XX
.?..UBNTXXXXXXXX
> 50: 20 20 20 20 31 34 30 31 32 33 20 20 60 80 02 41 140123
`??A
>
>
> commit 426c6cbc409cbda9ab1a9dbf15d3c2ef947eb8c1
> Author: Pali Rohár <pali@xxxxxxxxxx>
> Date: Mon Jan 25 16:02:27 2021 +0100
>
> net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips
>
> The workaround for VSOL V2801F brand based GPON SFP modules added
> in commit
> 0d035bed2a4a ("net: sfp: VSOL V2801F / CarlitoxxPro CPGOS03-0490 v2.0
> workaround") works only for IDs added explicitly to the list. Since
there
> are rebranded modules where OEM vendors put different strings into the
> vendor name field, we cannot base workaround on IDs only.
You might be fighting a losing battle here. There are companies that buy
cheap SFP devices and reprogram the EEPROM to identify them as higher
quality parts from reliable vendors. Checking the Vendor ID and part number
may tell you have a high quality part, but the code inside still has the
quirks you are trying to avoid.
I developed optoe while working for a (high quality) vendor of these
modules. Their support team runs into these counterfeit parts all the time.
(I don't work for them or anyone else any more.)
>
> Moreover the issue which the above mentioned commit tried to work
> around is
> generic not only to VSOL based modules, but rather to all GPON modules
> based on Realtek RTL8672 and RTL9601C chips.
>
> These include at least the following GPON modules:
> * V-SOL V2801F
> * C-Data FD511GX-RM0
> * OPTON GP801R
> * BAUDCOM BD-1234-SFM
> * CPGOS03-0490 v2.0
> * Ubiquiti U-Fiber Instant
> * EXOT EGS1
>
> These Realtek chips have broken EEPROM emulator which for N-byte read
> operation returns just the first byte of EEPROM data, followed by N-1
> zeros.
>
> Introduce a new function, sfp_id_needs_byte_io(), which detects SFP
> modules
> with broken EEPROM emulator based on N-1 zeros and switch to 1 byte
> EEPROM
> reading operation.
>
> Function sfp_i2c_read() now always uses single byte reading when it is
> required and when function sfp_hwmon_probe() detects single byte
> access,
> it disables registration of hwmon device, because in this case we
cannot
> reliably and atomically read 2 bytes as is required by the standard
for
> retrieving values from diagnostic area.
>
> (These Realtek chips are broken in a way that violates SFP standards
for
> diagnostic interface. Kernel in this case simply cannot do anything
less
> of skipping registration of the hwmon interface.)
>
> This patch fixes reading of EEPROM content from SFP modules based on
> Realtek RTL8672 and RTL9601C chips. Diagnostic interface of EEPROM
stays
> broken and cannot be fixed.
>
> commit 624407d2cf14ff58e53bf4b2af9595c4f21d606e
> Author: Russell King <rmk+kernel@xxxxxxxxxxxxxxx>
> Date: Sun Jan 10 10:58:32 2021 +0000
>
> net: sfp: cope with SFPs that set both LOS normal and LOS inverted
>
> The SFP MSA defines two option bits in byte 65 to indicate how the
> Rx_LOS signal on SFP pin 8 behaves:
>
> bit 2 - Loss of Signal implemented, signal inverted from standard
> definition in SFP MSA (often called "Signal Detect").
> bit 1 - Loss of Signal implemented, signal as defined in SFP MSA
> (often called "Rx_LOS").
>
> Clearly, setting both bits results in a meaningless situation: it
would
> mean that LOS is implemented in both the normal sense (1 = signal
loss)
> and inverted sense (0 = signal loss).
>
> Unfortunately, there are modules out there which set both bits, which
> will be initially interpret as "inverted" sense, and then, if the LOS
> signal changes state, we will toggle between LINK_UP and WAIT_LOS
> states.
>
> Change our LOS handling to give well defined behaviour: only interpret
> these bits as meaningful if exactly one is set, otherwise treat it as
> if LOS is not implemented.
>
> ommit 0d035bed2a4a6c4878518749348be61bf082d12a
> Author: Russell King <rmk+kernel@xxxxxxxxxxxxxxx>
> Date: Wed Dec 9 11:22:49 2020 +0000
>
> net: sfp: VSOL V2801F / CarlitoxxPro CPGOS03-0490 v2.0 workaround
>
> Add a workaround for the detection of VSOL V2801F / CarlitoxxPro
> CPGOS03-0490 v2.0 GPON module which CarlitoxxPro states needs single
> byte I2C reads to the EEPROM.
>
> Pali Rohár reports that he also has a CarlitoxxPro-based V2801F
module,
> which reports a manufacturer of "OEM". This manufacturer can't be
> matched as it appears in many different modules, so also match the
part
> number too.
>
> etc.
>
> Now, it could be you only work with TOR switches and their SFP modules.
> And they follow the standard in a better way than others. But the kernel
is
> used in many more environments that just data centers.
> We need to support SFPs in FTTH boxes, in aircraft inflight entertainment
> systems, industrial control etc.
Totally agree. As a replacement for sfp_i2c_read() and sfp_i2c_write(),
optoe would achieve everything the existing code does, plus it would support
paged devices. Works the same, but covers more capabilities.
AND, for TOR switches, and their SFP AND QSFP AND CMIS modules, it works
too, even when they don't adopt linux kernel networking. Unfortunately you
have decided that their architecture, handling networking outside the
kernel, is not allowed. My community does not get to have the perfectly
good code they use to access these devices accepted into the mainline
kernel.
>
>
> Andrew
Don