Re: [RFC 2/3] net: Provide switchdev driver for NXP's More Than IP L2 switch

From: Lukasz Majewski
Date: Tue Jun 29 2021 - 08:01:25 EST

Next message: Alexander Graf: "Re: [PATCH v4 1/3] iommu: io-pgtable: add DART pagetable format"
Previous message: Michael Ellerman: "Re: [PATCH] powerpc/4xx: Fix setup_kuep() on SMP"
In reply to: Vladimir Oltean: "Re: [RFC 2/3] net: Provide switchdev driver for NXP's More Than IP L2 switch"
Next in thread: Andrew Lunn: "Re: [RFC 2/3] net: Provide switchdev driver for NXP's More Than IP L2 switch"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Vladimir,

> On Tue, Jun 29, 2021 at 10:09:37AM +0200, Lukasz Majewski wrote:
> > Hi Vladimir,
> >
> > > On Mon, Jun 28, 2021 at 04:13:14PM +0200, Lukasz Majewski wrote:
> > > > > > > So before considering merging your changes, i would like
> > > > > > > to see a usable binding.
> > > > > > >
> > > > > > > I also don't remember seeing support for STP. Without
> > > > > > > that, your network has broadcast storm problems when
> > > > > > > there are loops. So i would like to see the code needed
> > > > > > > to put ports into blocking, listening, learning, and
> > > > > > > forwarding states.
> > > > > > >
> > > > > > > Andrew
> > > > >
> > > > > I cannot stress enough how important it is for us to see STP
> > > > > support and consequently the ndo_start_xmit procedure for
> > > > > switch ports.
> > > >
> > > > Ok.
> > > >
> > > > > Let me see if I understand correctly. When the switch is
> > > > > enabled, eth0 sends packets towards both physical switch
> > > > > ports, and eth1 sends packets towards none, but eth0 handles
> > > > > the link state of switch port 0, and eth1 handles the link
> > > > > state of switch port 1?
> > > >
> > > > Exactly, this is how FEC driver is utilized for this switch.
> > >
> > > This is a much bigger problem than anything which has to do with
> > > code organization. Linux does not have any sort of support for
> > > unmanaged switches.
> >
> > My impression is similar. This switch cannot easily fit into DSA
> > (lack of appending tags)
>
> No, this is not why the switch does not fit the DSA model.
> DSA assumes that the master interface and the switch are two
> completely separate devices which manage themselves independently.
> Their boundary is typically at the level of a MAC-to-MAC connection,
> although vendors have sometimes blurred this line a bit in the case
> of integrated switches. But the key point is that if there are 2
> external ports going to the switch, these should be managed by the
> switch driver. But when the switch is sandwiched between the Ethernet
> controller of the "DSA master" (the DMA engine of fec0) and the DSA
> master's MAC (still owned by fec), the separation isn't quite what
> DSA expects, is it? Remember that in the case of the MTIP switch, the
> fec driver needs to put the MACs going to the switch in promiscuous
> mode such that the switch behaves as a switch and actually forwards
> packets by MAC DA instead of dropping them. So the system is much
> more tightly coupled.
>
> +---------------------------------------------------------------------------+
> |
> | | +--------------+ +--------------------+--------+
> +------------+ | | | | MTIP switch |
> | | | | | fec 1 DMA |---x |
> | Port 2 |------| fec 1 MAC | | | | |
> \ / | | | | | +--------------+ |
> \/ +--------+ +------------+ |
> | /\ | | |
> +--------------+ +--------+ / \ +--------+
> +------------+ | | | | | |
> | | | | | fec 0 DMA |--------| Port 0 |
> | Port 1 |------| fec 0 MAC | | | | | |
> | | | | | +--------------+
> +--------+-----------+--------+ +------------+ |
> |
> +---------------------------------------------------------------------------+
>
> Is this DSA? I don't really think so, but you could still try to argue
> otherwise.
>
> The opposite is also true. DSA supports switches that don't append
> tags to packets (see sja1105). This doesn't make them "less DSA",
> just more of a pain to work with.
>
> > nor to switchdev.
> >
> > The latter is caused by two modes of operation:
> >
> > - Bypass mode (no switch) -> DMA1 and DMA0 are used
> > - Switch mode -> only DMA0 is used
> >
> >
> > Moreover, from my understanding of the CPSW - looks like it uses
> > always just a single DMA, and the switching seems to be the default
> > operation for two ethernet ports.
> >
> > The "bypass mode" from NXP's L2 switch seems to be achieved inside
> > the CPSW switch, by configuring it to not pass packets between
> > those ports.
>
> I don't exactly see the point you're trying to make here. At the end
> of the day, the only thing that matters is what you expose to the
> user. With no way (when the switch is enabled) for a socket opened on
> eth0 to send/receive packets coming only from the first port, and a
> socket opened on eth1 to send/receive packets coming only from the
> second port, I think this driver attempt is a pretty far cry from
> what a switch driver in Linux is expected to offer, be it modeled as
> switchdev or DSA.
>
> > > Please try to find out if your switch is supposed to be able
> > > to be managed (run control protocols on the CPU).
> >
> > It can support all the "normal" set of L2 switch features:
> >
> > - VLANs, lookup table (with learning), filtering and forwarding
> > (Multicast, Broadcast, Unicast), priority queues, IP snooping,
> > etc.
> >
> > Frames for BPDU are recognized by the switch and can be used to
> > implement support for RSTP. However, this switch has a separate
> > address space (not covered and accessed by FEC address).
> >
> > > If not, well, I
> > > don't know what to suggest.
> >
> > For me it looks like the NXP's L2 switch shall be treated _just_ as
> > offloading IP block to accelerate switching (NXP already support
> > dpaa[2] for example).
> >
> > The idea with having it configured on demand, when:
> > ip link add name br0 type bridge; ip link set br0 up;
> > ip link set eth0 master br0;
> > ip link set eth1 master br0;
> >
> > Seems to be a reasonable one. In the above scenario it would work
> > hand by hand with FEC drivers (as those would handle PHY
> > communication setup and link up/down events).
>
> You seem to imply that we are suggesting something different.
>
> > It would be welcome if the community could come up with some rough
> > idea how to proceed with this IP block support
>
> Ok, so what I would do if I really cared that much about mainline
> support is I would refactor the FEC driver to offer its core
> functionality to a new multi-port driver that is able to handle the
> FEC DMA interfaces, the MACs and the switch. EXPORT_SYMBOL_GPL is your
> friend.
>
> This driver would probe on a device tree binding with 3 "reg" values:
> 1 for the fec@800f0000, 1 for the fec@800f4000 and 1 for the
> switch@800f8000. No puppet master driver which coordinates other
> drivers, just a single driver that, depending on the operating state,
> manages all the SoC resources in a way that will offer a sane and
> consistent view of the Ethernet ports.
>
> So it will have a different .ndo_start_xmit implementation depending
> on whether the switch is bypassed or not (if you need to send a
> packet on eth1 and the switch is bypassed, you send it through the
> DMA interface of eth1, otherwise you send it through the DMA
> interface of eth0 in a way in which the switch will actually route it
> to the eth1 physical port).
>
> Then I would implement support for BPDU RX/TX (I haven't looked at the
> documentation, but I expect that what this switch offers for control
> traffic doesn't scale at high speeds (if it does, great, then send and
> receive all your packets as control packets, to have precise port
> identification). If it doesn't, you'll need a way to treat your data
> plane packets differently from the control plane packets. For the data
> plane, you can perhaps borrow some ideas from net/dsa/tag_8021q.c, or
> even from Tobias Waldekranz's proposal to just let data plane packets
> coming from the bridge slide into the switch with no precise control
> of the destination port at all, just let the switch perform FDB
> lookups for those packets because the switch hardware FDB is supposed
> to be more or less in sync with the bridge software FDB:
> https://patchwork.kernel.org/project/netdevbpf/cover/20210426170411.1789186-1-tobias@xxxxxxxxxxxxxx/
>

Thanks for sketching and sharing such detailed plan.

> > (especially that for example imx287 is used in many embedded devices
> > and is going to be in active production for next 10+ years).
>
> Well, I guess you have a plan then. There are still 10+ years left to
> enjoy the benefits of a proper driver design...

:-)

Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@xxxxxxx

Attachment: pgppowmaAhaVk.pgp
Description: OpenPGP digital signature

Next message: Alexander Graf: "Re: [PATCH v4 1/3] iommu: io-pgtable: add DART pagetable format"
Previous message: Michael Ellerman: "Re: [PATCH] powerpc/4xx: Fix setup_kuep() on SMP"
In reply to: Vladimir Oltean: "Re: [RFC 2/3] net: Provide switchdev driver for NXP's More Than IP L2 switch"
Next in thread: Andrew Lunn: "Re: [RFC 2/3] net: Provide switchdev driver for NXP's More Than IP L2 switch"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]