Re: [PATCH net-next v5 00/10] Decouple receive and transmit enablement in team driver
From: Marc Harvey
Date: Tue Apr 07 2026 - 19:06:28 EST
On Mon, Apr 6, 2026 at 10:04 PM Marc Harvey <marcharvey@xxxxxxxxxx> wrote:
>
> On Mon, Apr 6, 2026 at 7:44 AM Jakub Kicinski <kuba@xxxxxxxxxx> wrote:
> >
> > On Mon, 06 Apr 2026 03:03:36 +0000 Marc Harvey wrote:
> > > Allow independent control over receive and transmit enablement states
> > > for aggregated ports in the team driver.
> > >
> > > The motivation is that IEE 802.3ad LACP "independent control" can't
> > > be implemented for the team driver currently. This was added to the
> > > bonding driver in commit 240fd405528b ("bonding: Add independent
> > > control state machine").
> > >
> > > This series also has a few patches that add tests to show that the old
> > > coupled enablement still works and that the new decoupled enablement
> > > works as intended (4, 5, and 10).
> > >
> > > There are three patches with small fixes as well, with the goal of
> > > making the final decoupling patch clearer (1, 2, and 3).
> >
> > activebackup:
> >
> > TAP version 13
> > 1..1
> > # overriding timeout to 2400
> > # selftests: drivers/net/team: teamd_activebackup.sh
> > # Setting up two-link aggregation for runner activebackup
> > # Teamd version is: teamd 1.32
> > # Conf files are /tmp/tmp.ydjNK9Um7H and /tmp/tmp.xZuc3cWbN0
> > # This program is not intended to be run as root.
> > # This program is not intended to be run as root.
> > # Created team devices
> > # Teamd PIDs are 21457 and 21461
> > # exec of "ip link set eth0 up" failed: No such file or directory
> > # exec of "ip link set eth0 up" failed: No such file or directory
> > # exec of "ip link set eth1 up" failed: No such file or directory
> > # exec of "ip link set eth1 up" failed: No such file or directory
> > # PING fd00::2 (fd00::2) 56 data bytes
> > # 64 bytes from fd00::2: icmp_seq=1 ttl=64 time=0.753 ms
> > #
> > # --- fd00::2 ping statistics ---
> > # 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> > # rtt min/avg/max/mdev = 0.753/0.753/0.753/0.000 msPacket count for test_team2 was 0
> > # Waiting for eth0 in ns2-lZ0gqd to stop receiving
> > # Packet count for eth0 was 0Packet count for eth0 was 0
> > # Packet count for eth1 was 0
> > # Waiting for eth1 in ns2-lZ0gqd to stop receiving
> > # Packet count for eth1 was 0Packet count for eth0 was 0
> > # Packet count for eth1 was 0
> > # TEST: teamd active backup runner test [FAIL]
> > # Traffic did not reach team interface in NS2.
> > # Tearing down two-link aggregation
> > # Failed to kill daemon: Timer expired
> > # Failed to kill daemon: Timer expired
> > # Sending sigkill to teamd for test_team1
> > # rm: cannot remove '/var/run/teamd/test_team1.pid': No such file or directory
> > # rm: cannot remove '/var/run/teamd/test_team1.sock': No such file or directory
> > # Sending sigkill to teamd for test_team2
> > # rm: cannot remove '/var/run/teamd/test_team2.pid': No such file or directory
> > # rm: cannot remove '/var/run/teamd/test_team2.sock': No such file or directory
> > not ok 1 selftests: drivers/net/team: teamd_activebackup.sh # exit=1
> >
> >
> > transmit_failover:
> >
> > TAP version 13
> > 1..1
> > # overriding timeout to 2400
> > # selftests: drivers/net/team: transmit_failover.sh
> > # Error: ipv6: address not found.
> > # Setting team in ns2-yxjiUo to mode roundrobin
> > # Error: ipv6: address not found.
> > # Setting team in ns1-Jht6kA to mode broadcast
> > # Packet count for eth0 was 0
> > # Packet count for eth1 was 0
> > # Packet count for eth0 was 0
> > # Packet count for eth1 was 0
> > # Packet count for eth0 was 0
> > # Packet count for eth1 was 0
> > # TEST: Failover of 'broadcast' test [FAIL]
> > # eth0 not transmitting when both links enabled
> > # Setting team in ns1-Jht6kA to mode roundrobin
> > # Packet count for eth0 was 0
> > # Packet count for eth1 was 0
> > # Packet count for eth0 was 0
> > # Packet count for eth1 was 0
> > # Packet count for eth0 was 0
> > # Packet count for eth1 was 0
> > # TEST: Failover of 'roundrobin' test [FAIL]
> > # eth0 not transmitting when both links enabled
> > # Setting team in ns1-Jht6kA to mode random
> > # Packet count for eth0 was 0
> > # Packet count for eth1 was 0
> > # Packet count for eth0 was 0
> > # Packet count for eth1 was 0
> > # Packet count for eth0 was 0
> > # Packet count for eth1 was 0
> > # TEST: Failover of 'random' test [FAIL]
> > # eth0 not transmitting when both links enabled
> > not ok 1 selftests: drivers/net/team: transmit_failover.sh # exit=1
> > --
> > pw-bot: cr
>
> Apologies for all of the test failures. Before sending this revision,
> I ran each test thousands of times and observed no failures, so I
> thought the flakiness would be resolved.
>
> No matter what I try, I can't recreate either issue on my end. I've
> tried building with the exact config from one of the test runs
> (https://netdev-ctrl.bots.linux.dev/logs/vmksft/bonding/results/590921/).
> I've tried stressing the VM according to
> https://github.com/linux-netdev/nipa/wiki/How-to-run-netdev-selftests-CI-style#reproducing-unstable-tests
> (this makes the tests time out, but I can still see traffic). I've
> tried using the netdev-testing/net-next-2026-04-06--09-00 kernel
> source. I've tried in nested and unnested virtual machines. I've also
> tried running multiple test instances in parallel, but nothing
> recreates the issues. The issues seem related to tcpdump, but without
> reproducing them, I can only guess. Any suggestions for running the
> tests exactly as the CI does would be greatly appreciated.
>
> - Marc
Thank you very much to kuniyu@xxxxxxxxxx, who figured out how to
recreate the issue on Fedora. Fedora's /etc/services maps TCP port
1234 to the "search-agent" service (normal), which tcpdump then uses
to text-replace port numbers in its output. So the tests were looking
for ${ip_address}.1234, but tcpdump was spitting out
${ip_address}.search_agent. What is strange is that the test already
uses tcpdump's "-n" option: "Don't convert addresses (i.e., host
addresses, port numbers, etc.) to names."
It turns out that Fedora has a patched version of tcpdump that
separates the normal "-n" option into two options! "-n" handles host
addresses, and "-nn" handles port and protocol numbers. The tcpdump
invocation used by the selftests only uses "-n". What's stranger is
that passing "-nn" to tcpdump is actually portable, because under the
hood it is treated as a counter, with or without the Fedora patch:
https://github.com/the-tcpdump-group/tcpdump/blob/master/tcpdump.c#L1915
(thanks again to Kuniyuki for discovering this).
For v6, I will just change the TCP port to one that is not used by a
service, and will make the tcpdump helper function in the
net/forwarding lib use "-nn" instead of "-n".
- Marc