Re: [PATCH 0/3] net: ntb_netdev: Add Multi-queue support
From: Dave Jiang
Date: Wed Feb 25 2026 - 10:11:11 EST
On 2/24/26 8:36 PM, Koichiro Den wrote:
> On Tue, Feb 24, 2026 at 09:20:35AM -0700, Dave Jiang wrote:
>>
>>
>> On 2/24/26 8:28 AM, Koichiro Den wrote:
>>> Hi,
>>>
>>> ntb_netdev currently hard-codes a single NTB transport queue pair, which
>>> means the datapath effectively runs as a single-queue netdev regardless
>>> of available CPUs / parallel flows.
>>>
>>> The longer-term motivation here is throughput scale-out: allow
>>> ntb_netdev to grow beyond the single-QP bottleneck and make it possible
>>> to spread TX/RX work across multiple queue pairs as link speeds and core
>>> counts keep increasing.
>>>
>>> Multi-queue also unlocks the standard networking knobs on top of it. In
>>> particular, once the device exposes multiple TX queues, qdisc/tc can
>>> steer flows/traffic classes into different queues (via
>>> skb->queue_mapping), enabling per-flow/per-class scheduling and QoS in a
>>> familiar way.
>>>
>>> This series is a small plumbing step towards that direction:
>>>
>>> 1) Introduce a per-queue context object (struct ntb_netdev_queue) and
>>> move queue-pair state out of struct ntb_netdev. Probe creates queue
>>> pairs in a loop and configures the netdev queue counts to match the
>>> number that was successfully created.
>>>
>>> 2) Expose ntb_num_queues as a module parameter to request multiple
>>> queue pairs at probe time. The value is clamped to 1..64 and kept
>>> read-only for now (no runtime reconfiguration).
>>>
>>> 3) Report the active queue-pair count via ethtool -l (get_channels),
>>> so users can confirm the device configuration from user space.
>>>
>>> Compatibility:
>>> - Default remains ntb_num_queues=1, so behaviour is unchanged unless
>>> the user explicitly requests more queues.
>>>
>>> Kernel base:
>>> - ntb-next latest:
>>> commit 7b3302c687ca ("ntb_hw_amd: Fix incorrect debug message in link
>>> disable path")
>>>
>>> Usage (example):
>>> - modprobe ntb_netdev ntb_num_queues=<N> # Patch 2 takes care of it
>>> - ethtool -l <ifname> # Patch 3 takes care of it
>>>
>>> Patch summary:
>>> 1/3 net: ntb_netdev: Introduce per-queue context
>>> 2/3 net: ntb_netdev: Make queue pair count configurable
>>> 3/3 net: ntb_netdev: Expose queue pair count via ethtool -l
>>>
>>> Testing / results:
>>> Environment / command line:
>>> - 2x R-Car S4 Spider boards
>>> "Kernel base" (see above) + this series
>>> - For TCP load:
>>> [RC] $ sudo iperf3 -s
>>> [EP] $ sudo iperf3 -Z -c ${SERVER_IP} -l 65480 -w 512M -P 4
>>> - For UDP load:
>>> [RC] $ sudo iperf3 -s
>>> [EP] $ sudo iperf3 -ub0 -c ${SERVER_IP} -l 65480 -w 512M -P 4
>>>
>>> Before (without this series):
>>> TCP / UDP : 602 Mbps / 598 Mbps
>>>
>>> Before (ntb_num_queues=1):
>>> TCP / UDP : 588 Mbps / 605 Mbps
>>
>> What accounts for the dip in TCP performance?
>
> I believe this is within normal run-to-run variance. To be sure, I repeated the
> TCP tests multiple times. The aggregated results are:
>
> +------+----------+------------------+------------------+
> | | Baseline | ntb_num_queues=1 | ntb_num_queues=2 |
> +------+----------+------------------+------------------+
> | Mean | 599.5 | 595.2 (-0.7%) | 600.4 (+0.2%) |
> | Min | 590 | 590 (+0.0%) | 593 (+0.5%) |
> | Max | 605 | 604 (-0.2%) | 605 (+0.0%) |
> | Med | 602 | 593 | 601.5 |
> | SD | 5.84 | 6.01 | 4.12 |
> +------+----------+------------------+------------------+
>
> On my setup (2x R-Car S4 Spider), I do not observe any statistically meaningful
> improvement or degradation. For completeness, here is the raw data:
>
> .----------------------------- Baseline (without this series)
> : .----------------- ntb_num_queues=1
> : : .---- ntb_num_queues=2
> : : :
> #1 601 Mbps 604 Mbps 601 Mbps
> #2 604 Mbps 604 Mbps 603 Mbps
> #3 592 Mbps 590 Mbps 600 Mbps
> #4 593 Mbps 593 Mbps 603 Mbps
> #5 605 Mbps 591 Mbps 605 Mbps
> #6 590 Mbps 603 Mbps 602 Mbps
> #7 605 Mbps 590 Mbps 596 Mbps
> #8 598 Mbps 594 Mbps 593 Mbps
> #9 603 Mbps 590 Mbps 605 Mbps
> #10 604 Mbps 593 Mbps 596 Mbps
>
> To see a tangible performance gain, another patch series I submitted yesterday
> is also relevant:
>
> [PATCH 00/10] NTB: epf: Enable per-doorbell bit handling while keeping legacy offset
> https://lore.kernel.org/all/20260224133459.1741537-1-den@xxxxxxxxxxxxx/
>
> With that series applied as well, and with irq smp_affinity properly adjusted,
> the results become:
>
> After (ntb_num_queues=2 + the other series also applied):
> TCP / UDP : 1.15 Gbps / 1.18 Gbps
>
> In that sense, that series is also important groundwork from a performance
> perspective. Since that work touches NTB-tree code, I'd appreciate it if you
> could also have a look at that series.
>
> Side note: R-Car S4 Spider has limited BAR resources. Although BAR2 is
> resizable, ~2 MiB appears to be the practical ceiling for arbitrary mappings in
> this setup, so I haven't tested larger ntb_num_queues=<N> values. On platforms
> with more BAR space, sufficient CPUs for memcpy, or sufficent DMA channels for
> DMA memcpy available to ntb_transport, further scaling with larger <N> values
> should be possible.
Thanks for the data. I'll take a look at the other series.
>
> Thanks,
> Koichiro
>
>>
>>>
>>> After (ntb_num_queues=2):
>>> TCP / UDP : 602 Mbps / 598 Mbps
>>>
>>> Notes:
>>> In my current test environment, enabling multiple queue pairs does
>>> not improve throughput. The receive-side memcpy in ntb_transport is
>>> the dominant cost and limits scaling at present.
>>>
>>> Still, this series lays the groundwork for future scaling, for
>>> example once a transport backend is introduced that avoids memcpy
>>> to/from PCI memory space on both ends (see the superseded RFC
>>> series:
>>> https://lore.kernel.org/all/20251217151609.3162665-1-den@xxxxxxxxxxxxx/).
>>>
>>>
>>> Best regards,
>>> Koichiro
>>>
>>> Koichiro Den (3):
>>> net: ntb_netdev: Introduce per-queue context
>>> net: ntb_netdev: Make queue pair count configurable
>>> net: ntb_netdev: Expose queue pair count via ethtool -l
>>>
>>> drivers/net/ntb_netdev.c | 326 +++++++++++++++++++++++++++------------
>>> 1 file changed, 228 insertions(+), 98 deletions(-)
>>>
>>
>> for the series
>> Reviewed-by: Dave Jiang <dave.jiang@xxxxxxxxx>
>>