Re: 答复: [外部邮件] Re: [PATCH][rdma-next] RDMA/erdma: Use NUMA-aware allocation for MTT tables
From: Leon Romanovsky
Date: Thu Feb 26 2026 - 02:10:36 EST
On Thu, Feb 26, 2026 at 09:50:00AM +0800, Cheng Xu wrote:
>
>
> On 2/25/26 8:07 PM, Li,Rongqing(ACG CCN) wrote:
> >
> >>> On 2/25/26 4:51 PM, lirongqing wrote:
> >>>> From: Li RongQing <lirongqing@xxxxxxxxx>
> >>>>
> >>>> Currently, MTT (Memory Translation Table) buffers are allocated
> >>>> without NUMA awareness using kzalloc() and vzalloc(), which allocate
> >>>> memory on the NUMA node of the calling CPU. This can lead to
> >>>> cross-node memory access latencies if the erdma device is attached
> >>>> to a different NUMA socket.
> >>>>
> >>>> Switch to kzalloc_node() and vzalloc_node() to ensure MTT buffers
> >>>> are allocated on the local NUMA node of the PCIe device
> >> (dev->attrs.numa_node).
> >>>> This reduces latency for hardware access and improves performance.
> >>>>
> >>>> Signed-off-by: Li RongQing <lirongqing@xxxxxxxxx>
> >>>> ---
> >>>> drivers/infiniband/hw/erdma/erdma_verbs.c | 4 ++--
> >>>> 1 file changed, 2 insertions(+), 2 deletions(-)
> >>>>
> >>>
> >>> Hi, Li RongQing,
> >>>
> >>> Thanks for the patch. However, I think it is better to keep the
> >>> current behavior, for the following reasons:
> >>>
> >>> 1. This path is in the control plane, so allocating memory from a remote
> >>> NUMA node should not have a noticeable performance impact.
> >>
> >> If TLB Miss , or the internal cache misses , does the HCA need to query the MTT?
> >>
>
> This is rarely happen in our chip.
So why do we need this patch? The xxx_node() functions are useful when you
need to force allocation on a specific NUMA node. In most cases, a plain
kmalloc() will allocate memory on the same node as 'struct erdma_dev *dev',
which typically matches the PCI device's NUMA node.
Please avoid vague phrasing like 'potentially improves performance' in the
commit message and responses. It adds no meaningful information.
Also, please remove the dev->attrs.numa_node caching from erdma and rely on
dev_to_node() instead.
Thanks