[PATCH 0/3] net: ntb_netdev: Add Multi-queue support
From: Koichiro Den
Date: Tue Feb 24 2026 - 10:28:22 EST
Hi,
ntb_netdev currently hard-codes a single NTB transport queue pair, which
means the datapath effectively runs as a single-queue netdev regardless
of available CPUs / parallel flows.
The longer-term motivation here is throughput scale-out: allow
ntb_netdev to grow beyond the single-QP bottleneck and make it possible
to spread TX/RX work across multiple queue pairs as link speeds and core
counts keep increasing.
Multi-queue also unlocks the standard networking knobs on top of it. In
particular, once the device exposes multiple TX queues, qdisc/tc can
steer flows/traffic classes into different queues (via
skb->queue_mapping), enabling per-flow/per-class scheduling and QoS in a
familiar way.
This series is a small plumbing step towards that direction:
1) Introduce a per-queue context object (struct ntb_netdev_queue) and
move queue-pair state out of struct ntb_netdev. Probe creates queue
pairs in a loop and configures the netdev queue counts to match the
number that was successfully created.
2) Expose ntb_num_queues as a module parameter to request multiple
queue pairs at probe time. The value is clamped to 1..64 and kept
read-only for now (no runtime reconfiguration).
3) Report the active queue-pair count via ethtool -l (get_channels),
so users can confirm the device configuration from user space.
Compatibility:
- Default remains ntb_num_queues=1, so behaviour is unchanged unless
the user explicitly requests more queues.
Kernel base:
- ntb-next latest:
commit 7b3302c687ca ("ntb_hw_amd: Fix incorrect debug message in link
disable path")
Usage (example):
- modprobe ntb_netdev ntb_num_queues=<N> # Patch 2 takes care of it
- ethtool -l <ifname> # Patch 3 takes care of it
Patch summary:
1/3 net: ntb_netdev: Introduce per-queue context
2/3 net: ntb_netdev: Make queue pair count configurable
3/3 net: ntb_netdev: Expose queue pair count via ethtool -l
Testing / results:
Environment / command line:
- 2x R-Car S4 Spider boards
"Kernel base" (see above) + this series
- For TCP load:
[RC] $ sudo iperf3 -s
[EP] $ sudo iperf3 -Z -c ${SERVER_IP} -l 65480 -w 512M -P 4
- For UDP load:
[RC] $ sudo iperf3 -s
[EP] $ sudo iperf3 -ub0 -c ${SERVER_IP} -l 65480 -w 512M -P 4
Before (without this series):
TCP / UDP : 602 Mbps / 598 Mbps
Before (ntb_num_queues=1):
TCP / UDP : 588 Mbps / 605 Mbps
After (ntb_num_queues=2):
TCP / UDP : 602 Mbps / 598 Mbps
Notes:
In my current test environment, enabling multiple queue pairs does
not improve throughput. The receive-side memcpy in ntb_transport is
the dominant cost and limits scaling at present.
Still, this series lays the groundwork for future scaling, for
example once a transport backend is introduced that avoids memcpy
to/from PCI memory space on both ends (see the superseded RFC
series:
https://lore.kernel.org/all/20251217151609.3162665-1-den@xxxxxxxxxxxxx/).
Best regards,
Koichiro
Koichiro Den (3):
net: ntb_netdev: Introduce per-queue context
net: ntb_netdev: Make queue pair count configurable
net: ntb_netdev: Expose queue pair count via ethtool -l
drivers/net/ntb_netdev.c | 326 +++++++++++++++++++++++++++------------
1 file changed, 228 insertions(+), 98 deletions(-)
--
2.51.0