Re: [PATCH 0/4] NTB: ntb_transport: DMA fixes and scalability improvements

From: Dave Jiang

Date: Mon Dec 01 2025 - 14:05:12 EST




On 11/30/25 9:58 PM, Koichiro Den wrote:
> On Mon, Oct 27, 2025 at 09:43:27AM +0900, Koichiro Den wrote:
>> This series contains two DMA-related fixes (Patch #1-2) and two scalability
>> improvements (Patch #3-4) for ntb_transport. Behavior remains unchanged
>> unless new module parameters are explicitly set.
>>
>> New module parameters
>> =====================
>>
>> - use_tx_dma : Enable TX DMA independently (default: 0)
>> - use_rx_dma : Enable RX DMA independently (default: 0)
>> - num_tx_dma_chan : # of TX DMA channels per queue (default: 1)
>> - num_rx_dma_chan : # of RX DMA channels per queue (default: 1)
>>
>> Note: legacy 'use_dma' switch is kept and prioritized higher.
>> Enabling it always implies use_tx_dma=1 and use_rx_dma=1
>> regardless of whether use_(tx|rx)_dma=0 is appended.
>>
>> Performance measurement
>> =======================
>>
>> Tested on R-Car S4. With the following patchsets applied [1]:
>>
>> - [RFC PATCH 00/25] NTB/PCI: Add DW eDMA intr fallback and BAR MW offsets
>> (https://lore.kernel.org/all/20251023071916.901355-1-den@xxxxxxxxxxxxx/)
>> - [PATCH 0/2] Add 'tx_memcpy_offload' option to ntb_transport
>> (https://lore.kernel.org/all/20251023072105.901707-1-den@xxxxxxxxxxxxx/)
>>
>> throughput became bound by RX DMA service rate. Increasing the number of
>> RX DMA channels (>1) improved throughput substantially:
>>
>> - use_rx_dma=1 num_rx_dma_chan=1
>> ^^^^^^^^^^^^^^^^^
>> (full command: $ sudo modprobe ntb_transport tx_memcpy_offload=1 use_rx_dma=1 num_rx_dma_chan=1 use_intr=1)
>>
>> $ sudo sockperf tp -i $SERVER_IP -m 65400 -t 10 # RX DMA n_chan=1
>> sockperf: == version #3.10-no.git ==
>> [...]
>> sockperf: Summary: Message Rate is 8636 [msg/sec], Packet Rate is about 388620 [pkt/sec] (45 ip frags / msg)
>> sockperf: Summary: BandWidth is 538.630 MBps (4309.039 Mbps)
>> ^^^^^^^^^^^^^
>>
>> - use_rx_dma=1 num_rx_dma_chan=2
>> ^^^^^^^^^^^^^^^^^
>> (full command: $ sudo modprobe ntb_transport tx_memcpy_offload=1 use_rx_dma=1 num_rx_dma_chan=1 use_intr=1)
>>
>> $ sudo sockperf tp -i $SERVER_IP -m 65400 -t 10 # RX DMA n_chan=2
>> sockperf: == version #3.10-no.git ==
>> [...]
>> sockperf: Summary: Message Rate is 14283 [msg/sec], Packet Rate is about 642735 [pkt/sec] (45 ip frags / msg)
>> sockperf: Summary: BandWidth is 890.835 MBps (7126.680 Mbps)
>> ^^^^^^^^^^^^^
>>
>> [1] Additional changes are required to use DMA on R-Car S4. Those will be
>> posted separately.
>>
>>
>> Koichiro Den (4):
>> NTB: ntb_transport: Handle remapped contiguous region in vmalloc space
>> NTB: ntb_transport: Ack DMA memcpy descriptors to avoid wait-list
>> growth
>> NTB: ntb_transport: Add module parameters use_tx_dma/use_rx_dma
>> NTB: ntb_transport: Support multi-channel DMA via module parameters
>>
>> drivers/ntb/ntb_transport.c | 386 +++++++++++++++++++++++++-----------
>> 1 file changed, 270 insertions(+), 116 deletions(-)
>>
>> --
>> 2.48.1
>>
>
> Hi Dave,
>
> As a quick update, this series is likely to be superseded by another work
> on the "NTB transport backed by remote DW eDMA" series:
> https://lore.kernel.org/all/20251129160405.2568284-1-den@xxxxxxxxxxxxx/
> On R-Car S4, the remote eDMA-based approach clearly outperforms the
> existing architecture that relied on DMA_MEMCPY engine.

Does it use a different transport?

>
> Do you think it would be worth moving this older series forward?
> (I'm not sure whether there is an interest from others on this series,
> perhaps using some other platforms other than R-Car S4.)

I guess it doesn't hurt. Jon?

>
> Thank you in advance,
>
> Koichiro