Re: 答复: [外部邮件] Re: [PATCH][rdma-next] RDMA/erdma: Use NUMA-aware allocation for MTT tables

From: Cheng Xu

Date: Wed Feb 25 2026 - 20:50:35 EST




On 2/25/26 8:07 PM, Li,Rongqing(ACG CCN) wrote:
>
>>> On 2/25/26 4:51 PM, lirongqing wrote:
>>>> From: Li RongQing <lirongqing@xxxxxxxxx>
>>>>
>>>> Currently, MTT (Memory Translation Table) buffers are allocated
>>>> without NUMA awareness using kzalloc() and vzalloc(), which allocate
>>>> memory on the NUMA node of the calling CPU. This can lead to
>>>> cross-node memory access latencies if the erdma device is attached
>>>> to a different NUMA socket.
>>>>
>>>> Switch to kzalloc_node() and vzalloc_node() to ensure MTT buffers
>>>> are allocated on the local NUMA node of the PCIe device
>> (dev->attrs.numa_node).
>>>> This reduces latency for hardware access and improves performance.
>>>>
>>>> Signed-off-by: Li RongQing <lirongqing@xxxxxxxxx>
>>>> ---
>>>> drivers/infiniband/hw/erdma/erdma_verbs.c | 4 ++--
>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>
>>> Hi, Li RongQing,
>>>
>>> Thanks for the patch. However, I think it is better to keep the
>>> current behavior, for the following reasons:
>>>
>>> 1. This path is in the control plane, so allocating memory from a remote
>>> NUMA node should not have a noticeable performance impact.
>>
>> If TLB Miss , or the internal cache misses , does the HCA need to query the MTT?
>>

This is rarely happen in our chip.

>> [Li,Rongqing]
>>
>>> 2. With this change, the driver may fail the allocation when the local NUMA
>>> node is out of memory, even if other nodes still have available memory.
>>>
>
>
> When kmalloc_node() is called without __GFP_THISNODE and the target node
> lacks sufficient memory, SLUB allocates a folio from a different node
> other than the requested node.
>

You are right, thank you for pointing out this.

Cheng Xu

> So I think this is not a problem.
>
> [Li,Rongqing]
>
>
>
>>> Thanks,
>>> Cheng Xu
>>>
>>>> diff --git a/drivers/infiniband/hw/erdma/erdma_verbs.c
>>>> b/drivers/infiniband/hw/erdma/erdma_verbs.c
>>>> index 9f74aad..58da6ef 100644
>>>> --- a/drivers/infiniband/hw/erdma/erdma_verbs.c
>>>> +++ b/drivers/infiniband/hw/erdma/erdma_verbs.c
>>>> @@ -604,7 +604,7 @@ static struct erdma_mtt
>>> *erdma_create_cont_mtt(struct erdma_dev *dev,
>>>> return ERR_PTR(-ENOMEM);
>>>>
>>>> mtt->size = size;
>>>> - mtt->buf = kzalloc(mtt->size, GFP_KERNEL);
>>>> + mtt->buf = kzalloc_node(mtt->size, GFP_KERNEL,
>>>> +dev->attrs.numa_node);
>>>> if (!mtt->buf)
>>>> goto err_free_mtt;
>>>>
>>>> @@ -729,7 +729,7 @@ static struct erdma_mtt
>>> *erdma_create_scatter_mtt(struct erdma_dev *dev,
>>>> return ERR_PTR(-ENOMEM);
>>>>
>>>> mtt->size = ALIGN(size, PAGE_SIZE);
>>>> - mtt->buf = vzalloc(mtt->size);
>>>> + mtt->buf = vzalloc_node(mtt->size, dev->attrs.numa_node);
>>>> mtt->continuous = false;
>>>> if (!mtt->buf)
>>>> goto err_free_mtt;