Re: [PATCHv3 net 2/3] bonding: restructure ad_churn_machine
From: Jay Vosburgh
Date: Thu Feb 26 2026 - 19:37:51 EST
Hangbin Liu <liuhangbin@xxxxxxxxx> wrote:
>The current ad_churn_machine implementation only transitions the
>actor/partner churn state to churned or none after the churn timer expires.
>However, IEEE 802.1AX-2014 specifies that a port should enter the none
>state immediately once the actor’s port state enters synchronization.
>
>Another issue is that if the churn timer expires while the churn machine is
>not in the monitor state (e.g. already in churn), the state may remain
>stuck indefinitely with no further transitions. This becomes visible in
>multi-aggregator scenarios. For example:
>
>Ports 1 and 2 are in aggregator 1 (active)
>Ports 3 and 4 are in aggregator 2 (backup)
>
>Ports 1 and 2 should be in none
>Ports 3 and 4 should be in churned
>
>If a failover occurs due to port 2 link down/up, aggregator 2 becomes active.
>Under the current implementation, the resulting states may look like:
>
>agg 1 (backup): port 1 -> none, port 2 -> churned
>agg 2 (active): ports 3,4 keep in churned.
>
>The root cause is that ad_churn_machine() only clears the
>AD_PORT_CHURNED flag and starts a timer. When a churned port becomes active,
>its RX state becomes AD_RX_CURRENT, preventing the churn flag from being set
>again, leaving no way to retrigger the timer. Fixing this solely in
>ad_rx_machine() is insufficient.
>
>This patch rewrites ad_churn_machine according to IEEE 802.1AX-2014
>(Figures 6-23 and 6-24), ensuring correct churn detection, state transitions,
>and timer behavior. With new implementation, there is no need to set
>AD_PORT_CHURNED in ad_rx_machine().
>
>Fixes: 14c9551a32eb ("bonding: Implement port churn-machine (AD standard 43.4.17).")
>Reported-by: Liang Li <liali@xxxxxxxxxx>
>Tested-by: Liang Li <liali@xxxxxxxxxx>
>Signed-off-by: Hangbin Liu <liuhangbin@xxxxxxxxx>
I missed this last time it was posted, but reading it now I
think the functional change looks good, but I question the usefulness of
including the 25 line ASCII art version of the state diagram.
The standard is publicly available, so a comment saying that the
state machine logic conforms to IEEE 802.1AX-2014 figures 6-23 and 6-24
should be sufficient. Anyone seriously checking the code against the
standard will need to read the relevant text, so they'll be looking it
up anyway.
-J
>---
> drivers/net/bonding/bond_3ad.c | 96 +++++++++++++++++++++++++---------
> 1 file changed, 71 insertions(+), 25 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>index c47f6a69fd2a..68258d61fd1c 100644
>--- a/drivers/net/bonding/bond_3ad.c
>+++ b/drivers/net/bonding/bond_3ad.c
>@@ -44,7 +44,6 @@
> #define AD_PORT_STANDBY 0x80
> #define AD_PORT_SELECTED 0x100
> #define AD_PORT_MOVED 0x200
>-#define AD_PORT_CHURNED (AD_PORT_ACTOR_CHURN | AD_PORT_PARTNER_CHURN)
>
> /* Port Key definitions
> * key is determined according to the link speed, duplex and
>@@ -1254,7 +1253,6 @@ static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port)
> /* first, check if port was reinitialized */
> if (port->sm_vars & AD_PORT_BEGIN) {
> port->sm_rx_state = AD_RX_INITIALIZE;
>- port->sm_vars |= AD_PORT_CHURNED;
> /* check if port is not enabled */
> } else if (!(port->sm_vars & AD_PORT_BEGIN) && !port->is_enabled)
> port->sm_rx_state = AD_RX_PORT_DISABLED;
>@@ -1262,8 +1260,6 @@ static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port)
> else if (lacpdu && ((port->sm_rx_state == AD_RX_EXPIRED) ||
> (port->sm_rx_state == AD_RX_DEFAULTED) ||
> (port->sm_rx_state == AD_RX_CURRENT))) {
>- if (port->sm_rx_state != AD_RX_CURRENT)
>- port->sm_vars |= AD_PORT_CHURNED;
> port->sm_rx_timer_counter = 0;
> port->sm_rx_state = AD_RX_CURRENT;
> } else {
>@@ -1347,7 +1343,6 @@ static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port)
> port->partner_oper.port_state |= LACP_STATE_LACP_TIMEOUT;
> port->sm_rx_timer_counter = __ad_timer_to_ticks(AD_CURRENT_WHILE_TIMER, (u16)(AD_SHORT_TIMEOUT));
> port->actor_oper_port_state |= LACP_STATE_EXPIRED;
>- port->sm_vars |= AD_PORT_CHURNED;
> break;
> case AD_RX_DEFAULTED:
> __update_default_selected(port);
>@@ -1379,11 +1374,41 @@ static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port)
> * ad_churn_machine - handle port churn's state machine
> * @port: the port we're looking at
> *
>+ * IEEE 802.1AX-2014 Figure 6-23 - Actor Churn Detection machine state diagram
>+ *
>+ * BEGIN || (! port_enabled)
>+ * |
>+ * (3) (1) v
>+ * +----------------------+ ActorPort.Sync +-------------------------+
>+ * | NO_ACTOR_CHURN | <--------------------- | ACTOR_CHURN_MONITOR |
>+ * |======================| |=========================|
>+ * | actor_churn = FALSE; | ! ActorPort.Sync | actor_churn = FALSE; |
>+ * | | ---------------------> | Start actor_churn_timer |
>+ * +----------------------+ (4) +-------------------------+
>+ * ^ |
>+ * | |
>+ * | actor_churn_timer expired
>+ * | |
>+ * ActorPort.Sync | (2)
>+ * | +--------------------+ |
>+ * (3) | | ACTOR_CHURN | |
>+ * | |====================| |
>+ * +------------- | actor_churn = True | <-----------+
>+ * | |
>+ * +--------------------+
>+ *
>+ * Similar for the Figure 6-24 - Partner Churn Detection machine state diagram
>+ *
>+ * We don’t need to check actor_churn, because it can only be true when the
>+ * state is ACTOR_CHURN.
> */
> static void ad_churn_machine(struct port *port)
> {
>- if (port->sm_vars & AD_PORT_CHURNED) {
>- port->sm_vars &= ~AD_PORT_CHURNED;
>+ bool partner_synced = port->partner_oper.port_state & LACP_STATE_SYNCHRONIZATION;
>+ bool actor_synced = port->actor_oper_port_state & LACP_STATE_SYNCHRONIZATION;
>+
>+ /* ---- 1. begin or port not enabled ---- */
>+ if ((port->sm_vars & AD_PORT_BEGIN) || !port->is_enabled) {
> port->sm_churn_actor_state = AD_CHURN_MONITOR;
> port->sm_churn_partner_state = AD_CHURN_MONITOR;
> port->sm_churn_actor_timer_counter =
>@@ -1392,25 +1417,46 @@ static void ad_churn_machine(struct port *port)
> __ad_timer_to_ticks(AD_PARTNER_CHURN_TIMER, 0);
> return;
> }
>- if (port->sm_churn_actor_timer_counter &&
>- !(--port->sm_churn_actor_timer_counter) &&
>- port->sm_churn_actor_state == AD_CHURN_MONITOR) {
>- if (port->actor_oper_port_state & LACP_STATE_SYNCHRONIZATION) {
>- port->sm_churn_actor_state = AD_NO_CHURN;
>- } else {
>- port->churn_actor_count++;
>- port->sm_churn_actor_state = AD_CHURN;
>- }
>+
>+ if (port->sm_churn_actor_timer_counter)
>+ port->sm_churn_actor_timer_counter--;
>+
>+ if (port->sm_churn_partner_timer_counter)
>+ port->sm_churn_partner_timer_counter--;
>+
>+ /* ---- 2. timer expired, enter CHURN ---- */
>+ if (port->sm_churn_actor_state == AD_CHURN_MONITOR &&
>+ !port->sm_churn_actor_timer_counter) {
>+ port->sm_churn_actor_state = AD_CHURN;
>+ port->churn_actor_count++;
> }
>- if (port->sm_churn_partner_timer_counter &&
>- !(--port->sm_churn_partner_timer_counter) &&
>- port->sm_churn_partner_state == AD_CHURN_MONITOR) {
>- if (port->partner_oper.port_state & LACP_STATE_SYNCHRONIZATION) {
>- port->sm_churn_partner_state = AD_NO_CHURN;
>- } else {
>- port->churn_partner_count++;
>- port->sm_churn_partner_state = AD_CHURN;
>- }
>+
>+ if (port->sm_churn_partner_state == AD_CHURN_MONITOR &&
>+ !port->sm_churn_partner_timer_counter) {
>+ port->sm_churn_partner_state = AD_CHURN;
>+ port->churn_partner_count++;
>+ }
>+
>+ /* ---- 3. CHURN_MONITOR/CHURN + sync -> NO_CHURN ---- */
>+ if ((port->sm_churn_actor_state == AD_CHURN_MONITOR ||
>+ port->sm_churn_actor_state == AD_CHURN) && actor_synced)
>+ port->sm_churn_actor_state = AD_NO_CHURN;
>+
>+ if ((port->sm_churn_partner_state == AD_CHURN_MONITOR ||
>+ port->sm_churn_partner_state == AD_CHURN) && partner_synced)
>+ port->sm_churn_partner_state = AD_NO_CHURN;
>+
>+ /* ---- 4. NO_CHURN + !sync -> MONITOR ---- */
>+ if (port->sm_churn_actor_state == AD_NO_CHURN && !actor_synced) {
>+ port->sm_churn_actor_state = AD_CHURN_MONITOR;
>+ port->sm_churn_actor_timer_counter =
>+ __ad_timer_to_ticks(AD_ACTOR_CHURN_TIMER, 0);
>+ }
>+
>+ if (port->sm_churn_partner_state == AD_NO_CHURN && !partner_synced) {
>+ port->sm_churn_partner_state = AD_CHURN_MONITOR;
>+ port->sm_churn_partner_timer_counter =
>+ __ad_timer_to_ticks(AD_PARTNER_CHURN_TIMER, 0);
> }
> }
>
>--
>2.50.1
>
---
-Jay Vosburgh, jv@xxxxxxxxxxxxx