Re: [PATCHv3 net 2/3] bonding: restructure ad_churn_machine
From: Hangbin Liu
Date: Thu Feb 26 2026 - 19:53:58 EST
On Thu, Feb 26, 2026 at 04:36:46PM -0800, Jay Vosburgh wrote:
> Hangbin Liu <liuhangbin@xxxxxxxxx> wrote:
>
> >The current ad_churn_machine implementation only transitions the
> >actor/partner churn state to churned or none after the churn timer expires.
> >However, IEEE 802.1AX-2014 specifies that a port should enter the none
> >state immediately once the actor’s port state enters synchronization.
> >
> >Another issue is that if the churn timer expires while the churn machine is
> >not in the monitor state (e.g. already in churn), the state may remain
> >stuck indefinitely with no further transitions. This becomes visible in
> >multi-aggregator scenarios. For example:
> >
> >Ports 1 and 2 are in aggregator 1 (active)
> >Ports 3 and 4 are in aggregator 2 (backup)
> >
> >Ports 1 and 2 should be in none
> >Ports 3 and 4 should be in churned
> >
> >If a failover occurs due to port 2 link down/up, aggregator 2 becomes active.
> >Under the current implementation, the resulting states may look like:
> >
> >agg 1 (backup): port 1 -> none, port 2 -> churned
> >agg 2 (active): ports 3,4 keep in churned.
> >
> >The root cause is that ad_churn_machine() only clears the
> >AD_PORT_CHURNED flag and starts a timer. When a churned port becomes active,
> >its RX state becomes AD_RX_CURRENT, preventing the churn flag from being set
> >again, leaving no way to retrigger the timer. Fixing this solely in
> >ad_rx_machine() is insufficient.
> >
> >This patch rewrites ad_churn_machine according to IEEE 802.1AX-2014
> >(Figures 6-23 and 6-24), ensuring correct churn detection, state transitions,
> >and timer behavior. With new implementation, there is no need to set
> >AD_PORT_CHURNED in ad_rx_machine().
> >
> >Fixes: 14c9551a32eb ("bonding: Implement port churn-machine (AD standard 43.4.17).")
> >Reported-by: Liang Li <liali@xxxxxxxxxx>
> >Tested-by: Liang Li <liali@xxxxxxxxxx>
> >Signed-off-by: Hangbin Liu <liuhangbin@xxxxxxxxx>
>
> I missed this last time it was posted, but reading it now I
> think the functional change looks good, but I question the usefulness of
> including the 25 line ASCII art version of the state diagram.
>
> The standard is publicly available, so a comment saying that the
> state machine logic conforms to IEEE 802.1AX-2014 figures 6-23 and 6-24
> should be sufficient. Anyone seriously checking the code against the
> standard will need to read the relevant text, so they'll be looking it
> up anyway.
I added it here to help new readers and reviewers understand the logic
quickly. If you think there’s no need to include it in the code, maybe
we can move it to the commit description?
Thanks
Hangbin