Re: [PATCH net-next 0/3] net: bcmgenet: collapse TX priority queues
From: Nicolai Buchwitz
Date: Mon Jun 15 2026 - 07:45:54 EST
Hi Jakub
On 13.6.2026 23:57, Jakub Kicinski wrote:
On Fri, 12 Jun 2026 22:59:12 +0200 Nicolai Buchwitz wrote:
Tested on Raspberry Pi CM4 (BCM2711):
- Ovidiu's reproducer (iperf3 -u -b0 -P16 -t60) no longer trips
NETDEV_WATCHDOG.
- UDP sustains 956 Mbit/s line rate over 60 s with 0 datagrams
lost (0/4952890).
- Single-stream TCP throughput unchanged at 943 Mbit/s.
Of course it has no impact on a single TCP stream test, since TCP
stream can only use one queue. If anything it should help.
The testing here is not very convincing. At least install a realistic
qdisc (fq/fq_codel/cake) and run multi-stream test with multiple cores?
What's the CPU idle delta in such a test?
Fair. Tests I ran with fq_codel:
# TCP
iperf3 -c <remote> -P 16
# UDP
iperf3 -c <remote> -u -b 1000M -P 16
# RR
iperf3 -c <remote> -t 60 &
netperf -H <remote> -t TCP_RR -l 30 -- -r 1,1
With the following results (all based on net-next with PP):
PP PP+WRR PP+series
TCP -P16 Mbit/s 938 941 941
TCP retransmits 56228 57679 48606
UDP -b1000M -P16 Mbit/s 956 956 956
TCP_RR under TCP load 451.7 453.8 596.8
CPU idle 91.55% 90.53% 90.97%
CPU0 softirq 32.6% 33.4% 33.5%
CPU1-3 idle (avg) 97.6% 97.7% 97.7%
So WRR fixes the watchdog issue, otherwise is within noise. This series
adds ~31% TCP_RR under load and reduces retransmits by ~14%. Note, that
bcmgenet on BCM2711 has only two IRQs and no per-queue affinity, so all
HW interrupts are handled by one core regardless of queue count.
The reason for this change is not coming thru from the submission.
Ovidiu's patch makes much more intuitive sense. I'll apply that,
please rebase.
Will rebase and resend with a hopefully better reasoning :)
Thanks
Nicolai