Re: [PATCH net-next 5/8] net: lan966x: Add lag support for lan966x.

From: Vladimir Oltean
Date: Mon Jun 27 2022 - 05:40:42 EST


On Mon, Jun 27, 2022 at 08:46:12AM +0200, Horatiu Vultur wrote:
> > This incorrect logic seems to have been copied from ocelot from before
> > commit a14e6b69f393 ("net: mscc: ocelot: fix incorrect balancing with
> > down LAG ports").
> >
> > The issue is that you calculate bond_mask with only_active_ports=true.
> > This means the for_each_set_bit() will not iterate through the inactive
> > LAG ports, and won't set the bond_mask as the PGID destination for those
> > ports.
> >
> > That isn't what is desired; as explained in that commit, inactive LAG
> > ports should be removed via the aggregation PGIDs and not via the
> > destination PGIDs. Otherwise, an FDB entry targeted towards the
> > LAG (effectively towards the "primary" LAG port, whose logical port ID
> > gives the LAG ID) will not egress even the "secondary" LAG port if the
> > primary's link is down.
>
> Thanks for looking at this.
> That is correct, ocelot was the source of inspiration. The issue that
> you described in the mentioned commit is fixed in the last patch of this
> series.
> I will have a look at your commit and will try to integrated it. Thanks.

I figured that would be the case, although I didn't really understand
the explanation from patch 8/8 (arguably, there it is said that the
switch tries to send on the down port, not that it won't send on the up
port, which is more relevant information). But in any case, it would be
good to introduce code that works from the beginning, rather than fix it
up in a follow-up patch. I believe that the commit I referenced is a
simplification either way, since it removes the "only_active_ports"
argument from the bond mask function.