Re: [PATCH 0/3] net: ntb_netdev: Add Multi-queue support

From: Koichiro Den

Date: Tue Feb 24 2026 - 22:36:31 EST


On Tue, Feb 24, 2026 at 09:20:35AM -0700, Dave Jiang wrote:
>
>
> On 2/24/26 8:28 AM, Koichiro Den wrote:
> > Hi,
> >
> > ntb_netdev currently hard-codes a single NTB transport queue pair, which
> > means the datapath effectively runs as a single-queue netdev regardless
> > of available CPUs / parallel flows.
> >
> > The longer-term motivation here is throughput scale-out: allow
> > ntb_netdev to grow beyond the single-QP bottleneck and make it possible
> > to spread TX/RX work across multiple queue pairs as link speeds and core
> > counts keep increasing.
> >
> > Multi-queue also unlocks the standard networking knobs on top of it. In
> > particular, once the device exposes multiple TX queues, qdisc/tc can
> > steer flows/traffic classes into different queues (via
> > skb->queue_mapping), enabling per-flow/per-class scheduling and QoS in a
> > familiar way.
> >
> > This series is a small plumbing step towards that direction:
> >
> > 1) Introduce a per-queue context object (struct ntb_netdev_queue) and
> > move queue-pair state out of struct ntb_netdev. Probe creates queue
> > pairs in a loop and configures the netdev queue counts to match the
> > number that was successfully created.
> >
> > 2) Expose ntb_num_queues as a module parameter to request multiple
> > queue pairs at probe time. The value is clamped to 1..64 and kept
> > read-only for now (no runtime reconfiguration).
> >
> > 3) Report the active queue-pair count via ethtool -l (get_channels),
> > so users can confirm the device configuration from user space.
> >
> > Compatibility:
> > - Default remains ntb_num_queues=1, so behaviour is unchanged unless
> > the user explicitly requests more queues.
> >
> > Kernel base:
> > - ntb-next latest:
> > commit 7b3302c687ca ("ntb_hw_amd: Fix incorrect debug message in link
> > disable path")
> >
> > Usage (example):
> > - modprobe ntb_netdev ntb_num_queues=<N> # Patch 2 takes care of it
> > - ethtool -l <ifname> # Patch 3 takes care of it
> >
> > Patch summary:
> > 1/3 net: ntb_netdev: Introduce per-queue context
> > 2/3 net: ntb_netdev: Make queue pair count configurable
> > 3/3 net: ntb_netdev: Expose queue pair count via ethtool -l
> >
> > Testing / results:
> > Environment / command line:
> > - 2x R-Car S4 Spider boards
> > "Kernel base" (see above) + this series
> > - For TCP load:
> > [RC] $ sudo iperf3 -s
> > [EP] $ sudo iperf3 -Z -c ${SERVER_IP} -l 65480 -w 512M -P 4
> > - For UDP load:
> > [RC] $ sudo iperf3 -s
> > [EP] $ sudo iperf3 -ub0 -c ${SERVER_IP} -l 65480 -w 512M -P 4
> >
> > Before (without this series):
> > TCP / UDP : 602 Mbps / 598 Mbps
> >
> > Before (ntb_num_queues=1):
> > TCP / UDP : 588 Mbps / 605 Mbps
>
> What accounts for the dip in TCP performance?

I believe this is within normal run-to-run variance. To be sure, I repeated the
TCP tests multiple times. The aggregated results are:

+------+----------+------------------+------------------+
| | Baseline | ntb_num_queues=1 | ntb_num_queues=2 |
+------+----------+------------------+------------------+
| Mean | 599.5 | 595.2 (-0.7%) | 600.4 (+0.2%) |
| Min | 590 | 590 (+0.0%) | 593 (+0.5%) |
| Max | 605 | 604 (-0.2%) | 605 (+0.0%) |
| Med | 602 | 593 | 601.5 |
| SD | 5.84 | 6.01 | 4.12 |
+------+----------+------------------+------------------+

On my setup (2x R-Car S4 Spider), I do not observe any statistically meaningful
improvement or degradation. For completeness, here is the raw data:

.----------------------------- Baseline (without this series)
: .----------------- ntb_num_queues=1
: : .---- ntb_num_queues=2
: : :
#1 601 Mbps 604 Mbps 601 Mbps
#2 604 Mbps 604 Mbps 603 Mbps
#3 592 Mbps 590 Mbps 600 Mbps
#4 593 Mbps 593 Mbps 603 Mbps
#5 605 Mbps 591 Mbps 605 Mbps
#6 590 Mbps 603 Mbps 602 Mbps
#7 605 Mbps 590 Mbps 596 Mbps
#8 598 Mbps 594 Mbps 593 Mbps
#9 603 Mbps 590 Mbps 605 Mbps
#10 604 Mbps 593 Mbps 596 Mbps

To see a tangible performance gain, another patch series I submitted yesterday
is also relevant:

[PATCH 00/10] NTB: epf: Enable per-doorbell bit handling while keeping legacy offset
https://lore.kernel.org/all/20260224133459.1741537-1-den@xxxxxxxxxxxxx/

With that series applied as well, and with irq smp_affinity properly adjusted,
the results become:

After (ntb_num_queues=2 + the other series also applied):
TCP / UDP : 1.15 Gbps / 1.18 Gbps

In that sense, that series is also important groundwork from a performance
perspective. Since that work touches NTB-tree code, I'd appreciate it if you
could also have a look at that series.

Side note: R-Car S4 Spider has limited BAR resources. Although BAR2 is
resizable, ~2 MiB appears to be the practical ceiling for arbitrary mappings in
this setup, so I haven't tested larger ntb_num_queues=<N> values. On platforms
with more BAR space, sufficient CPUs for memcpy, or sufficent DMA channels for
DMA memcpy available to ntb_transport, further scaling with larger <N> values
should be possible.

Thanks,
Koichiro

>
> >
> > After (ntb_num_queues=2):
> > TCP / UDP : 602 Mbps / 598 Mbps
> >
> > Notes:
> > In my current test environment, enabling multiple queue pairs does
> > not improve throughput. The receive-side memcpy in ntb_transport is
> > the dominant cost and limits scaling at present.
> >
> > Still, this series lays the groundwork for future scaling, for
> > example once a transport backend is introduced that avoids memcpy
> > to/from PCI memory space on both ends (see the superseded RFC
> > series:
> > https://lore.kernel.org/all/20251217151609.3162665-1-den@xxxxxxxxxxxxx/).
> >
> >
> > Best regards,
> > Koichiro
> >
> > Koichiro Den (3):
> > net: ntb_netdev: Introduce per-queue context
> > net: ntb_netdev: Make queue pair count configurable
> > net: ntb_netdev: Expose queue pair count via ethtool -l
> >
> > drivers/net/ntb_netdev.c | 326 +++++++++++++++++++++++++++------------
> > 1 file changed, 228 insertions(+), 98 deletions(-)
> >
>
> for the series
> Reviewed-by: Dave Jiang <dave.jiang@xxxxxxxxx>
>