Re: [PATCH net] bonding: 802.3ad: Avoid packet loss when switching aggregator

From: Jay Vosburgh
Date: Mon Apr 08 2024 - 12:06:33 EST


Thomas Bogendoerfer <tbogendoerfer@xxxxxxx> wrote:

>If selection logic decides to switch to a new aggregator it disables
>all ports of the old aggregator, but doesn't enable ports on
>the new aggregator. These ports will eventually be enabled when
>the next LACPDU is received, which might take some time and without an
>active port transmitted frames are dropped. Avoid this by enabling
>already collected ports of the new aggregator immediately.
>
>Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@xxxxxxx>
>---
> drivers/net/bonding/bond_3ad.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>index c6807e473ab7..529e2a7c51e2 100644
>--- a/drivers/net/bonding/bond_3ad.c
>+++ b/drivers/net/bonding/bond_3ad.c
>@@ -1876,6 +1876,13 @@ static void ad_agg_selection_logic(struct aggregator *agg,
> __disable_port(port);
> }
> }
>+
>+ /* enable ports on new active aggregator */
>+ for (port = best->lag_ports; port;
>+ port = port->next_port_in_aggregator) {
>+ __enable_port(port);
>+ }
>+

I think this will do the wrong thing if the port in question is
not in a valid state to send or receive (i.e., it is not one of
COLLECTING_DISTRIBUTING, COLLECTING, or DISTRIBUTING).


As it happens, this situation, except for the case of individual
ports, is handled just below this code:

/* if the selected aggregator is of join individuals
* (partner_system is NULL), enable their ports
*/
active = __get_active_agg(origin);

if (active) {
if (!__agg_has_partner(active)) {
for (port = active->lag_ports; port;
port = port->next_port_in_aggregator) {
__enable_port(port);
}
*update_slave_arr = true;
}
}

rcu_read_unlock();

FWIW, looking at it, I'm not sure that "__agg_has_partner" is
the proper test for invididual-ness, but I'd have to do a bit of poking
to confirm that. In any event, that's not what you want to change right
now.

Instead of adding another block that does more or less the same
thing, I'd suggest updating this logic to include tests for C_D, C, or D
states, and enabling the ports if that is the case. Probably something
like (I have not tested or compiled this at all):

if (active) {
if (!__agg_has_partner(active)) {
[ ... the current !__agg_has_partner() stuff ]
} else {
for (port = active->lag_ports; port;
port = port->next_port_in_aggregator) {
switch (port->sm_mux_state) {
case AD_MUX_DISTRIBUTING:
case AD_MUX_COLLECTING_DISTRIBUTING:
ad_enable_collecting_distributing(port,
update_slave_arr);
port->ntt = true;
break;
case AD_MUX_COLLECTING:
ad_enable_collecting(port);
ad_disable_distributing(port, update_slave_arr);
port->ntt = true;
break;
default:
break;
}


Using the wrapper functions (instead of calling __enable_port,
et al, directly) enables logging for the transitions.

-J



> /* Slave array needs update. */
> *update_slave_arr = true;
> }
>--
>2.35.3
>
>