Re: [PATCH net] lan966x: Fix crash when adding interface under a lag

From: Michal Swiatkowski
Date: Mon Feb 05 2024 - 05:58:30 EST


On Mon, Feb 05, 2024 at 10:44:34AM +0100, Horatiu Vultur wrote:
> The 02/05/2024 09:44, Michal Swiatkowski wrote:
>
> Hi Michal,
>
> >
> > On Mon, Feb 05, 2024 at 09:07:56AM +0100, Horatiu Vultur wrote:
> > > There is a crash when adding one of the lan966x interfaces under a lag
> > > interface. The issue can be reproduced like this:
> > > ip link add name bond0 type bond miimon 100 mode balance-xor
> > > ip link set dev eth0 master bond0
> > >
> > > The reason is because when adding a interface under the lag it would go
> > > through all the ports and try to figure out which other ports are under
> > > that lag interface. And the issue is that lan966x can have ports that are
> > > NULL pointer as they are not probed. So then iterating over these ports
> > > it would just crash as they are NULL pointers.
> > > The fix consists in actually checking for NULL pointers before accessing
> > > something from the ports. Like we do in other places.
> > >
> > > Fixes: cabc9d49333d ("net: lan966x: Add lag support for lan966x")
> > > Signed-off-by: Horatiu Vultur <horatiu.vultur@xxxxxxxxxxxxx>
> > > ---
> > > drivers/net/ethernet/microchip/lan966x/lan966x_lag.c | 9 +++++++--
> > > 1 file changed, 7 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_lag.c b/drivers/net/ethernet/microchip/lan966x/lan966x_lag.c
> > > index 41fa2523d91d3..89a2c3176f1da 100644
> > > --- a/drivers/net/ethernet/microchip/lan966x/lan966x_lag.c
> > > +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_lag.c
> > > @@ -37,19 +37,24 @@ static void lan966x_lag_set_aggr_pgids(struct lan966x *lan966x)
> > >
> > > /* Now, set PGIDs for each active LAG */
> > > for (lag = 0; lag < lan966x->num_phys_ports; ++lag) {
> > > - struct net_device *bond = lan966x->ports[lag]->bond;
> > > + struct lan966x_port *port = lan966x->ports[lag];
> > > int num_active_ports = 0;
> > > + struct net_device *bond;
> > > unsigned long bond_mask;
> > > u8 aggr_idx[16];
> > >
> > > - if (!bond || (visited & BIT(lag)))
> > > + if (!port || !port->bond || (visited & BIT(lag)))
> > > continue;
> > >
> > > + bond = lan966x->ports[lag]->bond;
> > Why not bond = port->bond?
>
> That is also correct and more clear.
> I think I just copy the line that I have removed and put it here. As it
> has the same effect.
> I can update this in the next version.
>

Great, thanks, fell free to add my reviewed-by tag in next version.

Michal

> >
> > > bond_mask = lan966x_lag_get_mask(lan966x, bond);
> > >
> > > for_each_set_bit(p, &bond_mask, lan966x->num_phys_ports) {
> > > struct lan966x_port *port = lan966x->ports[p];
> > >
> > > + if (!port)
> > > + continue;
> > > +
> > > lan_wr(ANA_PGID_PGID_SET(bond_mask),
> > > lan966x, ANA_PGID(p));
> > > if (port->lag_tx_active)
> > > --
> > > 2.34.1
> > >
> > Only nit, otherwise:
> > Reviewed-by: Michal Swiatkowski <michal.swiatkowski@xxxxxxxxxxxxxxx>
> >
> > Thanks,
> > Michal
>
> --
> /Horatiu