Re: [PATCH 2/2] net: thunderbolt: enlarge RX/TX ring and set NAPI weight for sustained load

From: Mika Westerberg

Date: Tue Apr 28 2026 - 10:37:23 EST


On Tue, Apr 28, 2026 at 02:54:58PM +0200, Andrew Lunn wrote:
> On Tue, Apr 28, 2026 at 09:42:53AM +0200, Mika Westerberg wrote:
> > On Mon, Apr 27, 2026 at 06:55:21PM -0700, Benjamin Berman wrote:
> > > The default TBNET_RING_SIZE of 256 and the NAPI_POLL_WEIGHT of 64
> > > implicit in netif_napi_add() are too small for host-to-host Thunderbolt
> > > networking under sustained bulk traffic. Running NCCL all-reduce over
> > > tb-lo on a three-node chain (two TB3 endpoints plus a TB4 Maple Ridge
> > > transit) produces rx_missed_errors at ~1 % of rx_packets on the transit
> > > and ~0.6 % on the endpoints, with rx_packets stalling against a peer's
> > > continuing tx_packets.
> > >
> > > Raise TBNET_RING_SIZE to 2048 (8x) and use netif_napi_add_weight() with
> > > a per-NAPI weight of 256 so tbnet_poll() drains more frames per softirq
> > > invocation. With matching sysctls (net.core.netdev_budget=1024,
> > > net.core.netdev_budget_usecs=8000) rx_missed_errors stays below 0.005 %
> > > over a 192 GB all-reduce workload on the same hardware.
> > >
> > > Generated-by: Claude Opus 4.7 <claude-opus-4-7@xxxxxxxxxxxxx>
> > > Tested-by: Benjamin Berman <benjamin.s.berman@xxxxxxxxx>
> > > Signed-off-by: Benjamin Berman <benjamin.s.berman@xxxxxxxxx>
> >
> > For ring size I don't have any objections. The current ring size 256 is
> > arbitrary and at the time seemed reasonable.
> >
> > For the poll weigth there is the comment in netdevice.h:
> >
> > /* Default NAPI poll() weight
> > * Device drivers are strongly advised to not use bigger value
> > */
> > #define NAPI_POLL_WEIGHT 64
> >
> > But if you see improvement using 256 here I'm fine with that unless the
> > network folks advice otherwise.
>
> I just did a quick sample of other drivers which change the NAPI
> weight. Of the 10 i looked at, 9 reduced the weight. Only one
> increased it.

Yeah, I noticed it too. That's why asking for consultancy :)

> I would like the core netdev people to comment on this, before it is
> accepted.
>
> Questions which come to mind:
>
> Why is the polling not happening frequently enough?
>
> Is it frequently swapping between polling and interrupts?
>
> Is there interrupt coalesce going on, and the coalesce time set too
> high, so that by the time the interrupt fires the ring is full? Can
> you play with ethtool -C?

Thanks!

I'll leave these to Benjamin and Claude AI to answer.

One thing that could affect is the interrupt throttling that the hardware
is doing. We have quite big value there by default. Lowering that may have
affect as well. I just posted a patch series where one of the patches makes
this configurable in the tbnet driver so you could apply that and play with
the throttling value:

https://lore.kernel.org/linux-usb/20260428072209.3084930-6-mika.westerberg@xxxxxxxxxxxxxxx/