[PATCH 2/2] net: thunderbolt: enlarge RX/TX ring and set NAPI weight for sustained load

From: Benjamin Berman

Date: Mon Apr 27 2026 - 21:57:20 EST


The default TBNET_RING_SIZE of 256 and the NAPI_POLL_WEIGHT of 64
implicit in netif_napi_add() are too small for host-to-host Thunderbolt
networking under sustained bulk traffic. Running NCCL all-reduce over
tb-lo on a three-node chain (two TB3 endpoints plus a TB4 Maple Ridge
transit) produces rx_missed_errors at ~1 % of rx_packets on the transit
and ~0.6 % on the endpoints, with rx_packets stalling against a peer's
continuing tx_packets.

Raise TBNET_RING_SIZE to 2048 (8x) and use netif_napi_add_weight() with
a per-NAPI weight of 256 so tbnet_poll() drains more frames per softirq
invocation. With matching sysctls (net.core.netdev_budget=1024,
net.core.netdev_budget_usecs=8000) rx_missed_errors stays below 0.005 %
over a 192 GB all-reduce workload on the same hardware.

Generated-by: Claude Opus 4.7 <claude-opus-4-7@xxxxxxxxxxxxx>
Tested-by: Benjamin Berman <benjamin.s.berman@xxxxxxxxx>
Signed-off-by: Benjamin Berman <benjamin.s.berman@xxxxxxxxx>
---
drivers/net/thunderbolt/main.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/thunderbolt/main.c b/drivers/net/thunderbolt/main.c
index 7aae5d915..3a096f7c5 100644
--- a/drivers/net/thunderbolt/main.c
+++ b/drivers/net/thunderbolt/main.c
@@ -31,7 +31,7 @@
#define TBNET_LOGIN_TIMEOUT 500
#define TBNET_LOGOUT_TIMEOUT 1000

-#define TBNET_RING_SIZE 256
+#define TBNET_RING_SIZE 2048
#define TBNET_LOGIN_RETRIES 60
#define TBNET_LOGOUT_RETRIES 10
#define TBNET_E2E BIT(0)
@@ -1383,7 +1383,7 @@ static int tbnet_probe(struct tb_service *svc, const struct tb_service_id *id)
dev->features = dev->hw_features | NETIF_F_HIGHDMA;
dev->hard_header_len += sizeof(struct thunderbolt_ip_frame_header);

- netif_napi_add(dev, &net->napi, tbnet_poll);
+ netif_napi_add_weight(dev, &net->napi, tbnet_poll, 256);

/* MTU range: 68 - 65522 */
dev->min_mtu = ETH_MIN_MTU;
--
2.43.0