Re: [PATCH net-next 5/8] net: lan966x: Add lag support for lan966x.
From: Horatiu Vultur
Date: Mon Jun 27 2022 - 02:42:29 EST
The 06/26/2022 17:11, Vladimir Oltean wrote:
>
> Hi Horatiu,
Hi Vladimir,
>
> Just casually browsing through the patches. A comment below.
>
> On Sun, Jun 26, 2022 at 03:04:48PM +0200, Horatiu Vultur wrote:
> > Add link aggregation hardware offload support for lan966x
> >
> > Signed-off-by: Horatiu Vultur <horatiu.vultur@xxxxxxxxxxxxx>
> > ---
> > .../net/ethernet/microchip/lan966x/Makefile | 2 +-
> > .../ethernet/microchip/lan966x/lan966x_lag.c | 296 ++++++++++++++++++
> > .../ethernet/microchip/lan966x/lan966x_main.h | 28 ++
> > .../microchip/lan966x/lan966x_switchdev.c | 78 ++++-
> > 4 files changed, 388 insertions(+), 16 deletions(-)
> > create mode 100644 drivers/net/ethernet/microchip/lan966x/lan966x_lag.c
> >
> > diff --git a/drivers/net/ethernet/microchip/lan966x/Makefile b/drivers/net/ethernet/microchip/lan966x/Makefile
> > index fd2e0ebb2427..0c22c86bdaa9 100644
> > --- a/drivers/net/ethernet/microchip/lan966x/Makefile
> > +++ b/drivers/net/ethernet/microchip/lan966x/Makefile
> > @@ -8,4 +8,4 @@ obj-$(CONFIG_LAN966X_SWITCH) += lan966x-switch.o
> > lan966x-switch-objs := lan966x_main.o lan966x_phylink.o lan966x_port.o \
> > lan966x_mac.o lan966x_ethtool.o lan966x_switchdev.o \
> > lan966x_vlan.o lan966x_fdb.o lan966x_mdb.o \
> > - lan966x_ptp.o lan966x_fdma.o
> > + lan966x_ptp.o lan966x_fdma.o lan966x_lag.o
> > diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_lag.c b/drivers/net/ethernet/microchip/lan966x/lan966x_lag.c
> > new file mode 100644
> > index 000000000000..c721a05d44d2
> > --- /dev/null
> > +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_lag.c
> > @@ -0,0 +1,296 @@
> > +// SPDX-License-Identifier: GPL-2.0+
> > +
> > +#include "lan966x_main.h"
> > +
> > +static void lan966x_lag_set_aggr_pgids(struct lan966x *lan966x)
> > +{
> > + u32 visited = GENMASK(lan966x->num_phys_ports - 1, 0);
> > + int p, lag, i;
> > +
> > + /* Reset destination and aggregation PGIDS */
> > + for (p = 0; p < lan966x->num_phys_ports; ++p)
> > + lan_wr(ANA_PGID_PGID_SET(BIT(p)),
> > + lan966x, ANA_PGID(p));
> > +
> > + for (p = PGID_AGGR; p < PGID_SRC; ++p)
> > + lan_wr(ANA_PGID_PGID_SET(visited),
> > + lan966x, ANA_PGID(p));
> > +
> > + /* The visited ports bitmask holds the list of ports offloading any
> > + * bonding interface. Initially we mark all these ports as unvisited,
> > + * then every time we visit a port in this bitmask, we know that it is
> > + * the lowest numbered port, i.e. the one whose logical ID == physical
> > + * port ID == LAG ID. So we mark as visited all further ports in the
> > + * bitmask that are offloading the same bonding interface. This way,
> > + * we set up the aggregation PGIDs only once per bonding interface.
> > + */
> > + for (p = 0; p < lan966x->num_phys_ports; ++p) {
> > + struct lan966x_port *port = lan966x->ports[p];
> > +
> > + if (!port || !port->bond)
> > + continue;
> > +
> > + visited &= ~BIT(p);
> > + }
> > +
> > + /* Now, set PGIDs for each active LAG */
> > + for (lag = 0; lag < lan966x->num_phys_ports; ++lag) {
> > + struct lan966x_port *port = lan966x->ports[lag];
> > + int num_active_ports = 0;
> > + struct net_device *bond;
> > + unsigned long bond_mask;
> > + u8 aggr_idx[16];
> > +
> > + if (!port || !port->bond || (visited & BIT(lag)))
> > + continue;
> > +
> > + bond = port->bond;
> > + bond_mask = lan966x_lag_get_mask(lan966x, bond, true);
> > +
> > + for_each_set_bit(p, &bond_mask, lan966x->num_phys_ports) {
> > + lan_wr(ANA_PGID_PGID_SET(bond_mask),
> > + lan966x, ANA_PGID(p));
> > + aggr_idx[num_active_ports++] = p;
> > + }
>
> This incorrect logic seems to have been copied from ocelot from before
> commit a14e6b69f393 ("net: mscc: ocelot: fix incorrect balancing with
> down LAG ports").
>
> The issue is that you calculate bond_mask with only_active_ports=true.
> This means the for_each_set_bit() will not iterate through the inactive
> LAG ports, and won't set the bond_mask as the PGID destination for those
> ports.
>
> That isn't what is desired; as explained in that commit, inactive LAG
> ports should be removed via the aggregation PGIDs and not via the
> destination PGIDs. Otherwise, an FDB entry targeted towards the
> LAG (effectively towards the "primary" LAG port, whose logical port ID
> gives the LAG ID) will not egress even the "secondary" LAG port if the
> primary's link is down.
Thanks for looking at this.
That is correct, ocelot was the source of inspiration. The issue that
you described in the mentioned commit is fixed in the last patch of this
series.
I will have a look at your commit and will try to integrated it. Thanks.
>
> > +
> > + for (i = PGID_AGGR; i < PGID_SRC; ++i) {
> > + u32 ac;
> > +
> > + ac = lan_rd(lan966x, ANA_PGID(i));
> > + ac &= ~bond_mask;
> > + /* Don't do division by zero if there was no active
> > + * port. Just make all aggregation codes zero.
> > + */
> > + if (num_active_ports)
> > + ac |= BIT(aggr_idx[i % num_active_ports]);
> > + lan_wr(ANA_PGID_PGID_SET(ac),
> > + lan966x, ANA_PGID(i));
> > + }
> > +
> > + /* Mark all ports in the same LAG as visited to avoid applying
> > + * the same config again.
> > + */
> > + for (p = lag; p < lan966x->num_phys_ports; p++) {
> > + struct lan966x_port *port = lan966x->ports[p];
> > +
> > + if (!port)
> > + continue;
> > +
> > + if (port->bond == bond)
> > + visited |= BIT(p);
> > + }
> > + }
> > +}
--
/Horatiu