[PATCH 0/3] net: ntb_netdev: Add Multi-queue support

From: Koichiro Den

Date: Tue Feb 24 2026 - 10:28:22 EST


Hi,

ntb_netdev currently hard-codes a single NTB transport queue pair, which
means the datapath effectively runs as a single-queue netdev regardless
of available CPUs / parallel flows.

The longer-term motivation here is throughput scale-out: allow
ntb_netdev to grow beyond the single-QP bottleneck and make it possible
to spread TX/RX work across multiple queue pairs as link speeds and core
counts keep increasing.

Multi-queue also unlocks the standard networking knobs on top of it. In
particular, once the device exposes multiple TX queues, qdisc/tc can
steer flows/traffic classes into different queues (via
skb->queue_mapping), enabling per-flow/per-class scheduling and QoS in a
familiar way.

This series is a small plumbing step towards that direction:

1) Introduce a per-queue context object (struct ntb_netdev_queue) and
move queue-pair state out of struct ntb_netdev. Probe creates queue
pairs in a loop and configures the netdev queue counts to match the
number that was successfully created.

2) Expose ntb_num_queues as a module parameter to request multiple
queue pairs at probe time. The value is clamped to 1..64 and kept
read-only for now (no runtime reconfiguration).

3) Report the active queue-pair count via ethtool -l (get_channels),
so users can confirm the device configuration from user space.

Compatibility:
- Default remains ntb_num_queues=1, so behaviour is unchanged unless
the user explicitly requests more queues.

Kernel base:
- ntb-next latest:
commit 7b3302c687ca ("ntb_hw_amd: Fix incorrect debug message in link
disable path")

Usage (example):
- modprobe ntb_netdev ntb_num_queues=<N> # Patch 2 takes care of it
- ethtool -l <ifname> # Patch 3 takes care of it

Patch summary:
1/3 net: ntb_netdev: Introduce per-queue context
2/3 net: ntb_netdev: Make queue pair count configurable
3/3 net: ntb_netdev: Expose queue pair count via ethtool -l

Testing / results:
Environment / command line:
- 2x R-Car S4 Spider boards
"Kernel base" (see above) + this series
- For TCP load:
[RC] $ sudo iperf3 -s
[EP] $ sudo iperf3 -Z -c ${SERVER_IP} -l 65480 -w 512M -P 4
- For UDP load:
[RC] $ sudo iperf3 -s
[EP] $ sudo iperf3 -ub0 -c ${SERVER_IP} -l 65480 -w 512M -P 4

Before (without this series):
TCP / UDP : 602 Mbps / 598 Mbps

Before (ntb_num_queues=1):
TCP / UDP : 588 Mbps / 605 Mbps

After (ntb_num_queues=2):
TCP / UDP : 602 Mbps / 598 Mbps

Notes:
In my current test environment, enabling multiple queue pairs does
not improve throughput. The receive-side memcpy in ntb_transport is
the dominant cost and limits scaling at present.

Still, this series lays the groundwork for future scaling, for
example once a transport backend is introduced that avoids memcpy
to/from PCI memory space on both ends (see the superseded RFC
series:
https://lore.kernel.org/all/20251217151609.3162665-1-den@xxxxxxxxxxxxx/).


Best regards,
Koichiro

Koichiro Den (3):
net: ntb_netdev: Introduce per-queue context
net: ntb_netdev: Make queue pair count configurable
net: ntb_netdev: Expose queue pair count via ethtool -l

drivers/net/ntb_netdev.c | 326 +++++++++++++++++++++++++++------------
1 file changed, 228 insertions(+), 98 deletions(-)

--
2.51.0