Re: [EXT] Re: [PATCH] bus: fsl-mc: Add ACPI support for fsl-mc

From: Russell King - ARM Linux admin
Date: Fri Jan 31 2020 - 10:15:38 EST


On Fri, Jan 31, 2020 at 03:29:06PM +0100, Andrew Lunn wrote:
> > > But by design SFP, SFP+, and QSFP cages are not fixed function network
> > > adapters. They are physical and logical devices that can adapt to
> > > what is plugged into them. How the devices are exposed should be
> > > irrelevant to this conversation it is about the underlying
> > > connectivity.
> >
> > Apologies - I was under the impression that SFP and friends were a
> > physical-layer thing and that a MAC in the SoC would still be fixed such
> > that its DMA and interrupt configuration could be statically described
> > regardless of what transceiver was plugged in (even if some configurations
> > might not use every interrupt/stream ID/etc.) If that isn't the case I shall
> > go and educate myself further.
>
> Hi Robin
>
> It gets interesting with QSFP cages. The Q is quad, there are 4 SERDES
> lanes. You can use them for 1x 40G link, or you can split them into 4x
> 10G links. So you either need one MAC or 4 MACs connecting to the
> cage, and this can change on the fly when a modules is ejected and
> replaced with another module.

I think it's even more complicated than that. If you have a QSFP+
fiber module, that can be connected to four fibers which can either
go to another QSFP+ module, or four separate SFP+ modules.

That means it's a manual configuration decision whether to operate
the QSFP+ module as a single 40G link, or as four separate 10G links.

> There are only one set of control pins
> for i2c, loss of signal, TX disable, module inserted. So where the
> interrupt/stream ID/etc are mapped needs some flexibility.

QSFP changes the way the modules are controlled; gone are many of the
hardware signals, replaced by registers in the I2C space. The
remaining hardware signals are:

ModSelL module select (to enable the I2C bus)
ResetL module reset
SCL/SDA I2C bus
ModPrsL module present
IntL interrupt (but not too useful from what I can see!)
LPMode low power mode (can be overriden via the I2C bus)

> There is also to some degree a conflict with hiding all this inside
> firmware. This is complex stuff. It is much better to have one core
> implementing in Linux plus some per hardware driver support, than
> having X firmware blobs, generally closed source, each with there own
> bugs which nobody can fix.

QSFP and SFP support is not really part of the DPAA2 firmware.

I have some prototype implementation for driving the QSFP+ cage, but
I haven't yet worked out how to sensible deal with the "is it 4x 10G
or 1x 40G" issue you mention above, and how to interface the QSFP+
driver sensibly with one or four network drivers.

I've been concentrating more on the SFP/SFP+ problem on the Honeycomb
board which is what most people will have, working out how to sensibly
drive the hardware so that our existing SFP support in the kernel can
work sensibly. In the last couple of days, I've managed to get
something together which works, switching between 1000base-X and SGMII
on this hardware, using some of the patches I've already pointed to
over the last few weeks. This hardware falls into the "split PCS and
MAC" problem space, so it's relevent to many people - and it's
important that we don't rush into a solution that works for one
implementation and not everyone. This is why I haven't responded to
Jose's proposal - I'm still working out what is required for others,
but what I can say is that it isn't what Jose has proposed. I had
asked Jose to hold off, but he's understandably eager to solve the
problem in front of him at the expense of everyone else.

What I've found is that any attempt to split the current
"phylink_mac_ops" interface between the PCS and MAC blocks results, as
I suspected, in mvneta and mvpp2 suffering very badly; the hardware
does not split along those functional blocks at all well.

My current state of play for this is in my "cex7" branch, pushed out
earlier today. It's a bit hacky right now, and there's various issues
that need to be solved, but it is functional with the right board boot
configuration (basically the DPC file, which is one of the configs for
the MC firmware.)

I'm planning to look at what's required for the faster speeds; there's
other PCS PHYs on this platform that support the other speeds (10G, 25G,
40G, 100G) accessed via Clause 45 cycles.

As for the DSA issue you've raised with DSA links, I don't see any
obvious solution for that - the whole "if no fixed-link is specified,
default to the highest speed" is a real problem; the conversion of DSA
to phylink for the CPU and DSA ports did not take account of that.
phylink has _zero_ information in that case to know how the link should
be configured - there is no PHY, there is no fixed-link specification,
there is absolutely nothing. So it's no surprise when phylink tries to
configure speed=0 duplex=half pause=off on these interfaces when they're
brought up. I notice that this work was contributed by NXP - and in my
mind illustrates that they did not think about what they were doing
there either. They certainly never ran phylink with debugging on and
considered whether the phylink_mac_config() calls contained sensible
information. Did they even have all the information necessary to work
out what was required - I doubt it very much. Did they realise that the
fixed-link specification was optional, did they realise that there
could be a PHY on these links, and did they consider what the behaviour
would be in those cases? And now we have something of a headache
trying to work out how to solve this - one thing is certain, whatever
the fix is, it isn't going to be nice to be backported to stable trees.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up