RE: [PATCH v2] IB/mlx4: delete allocated id_map_entry while sending REJ
From: Praveen Kannoju
Date: Mon Jun 15 2026 - 13:28:54 EST
Confidential - Oracle Restricted \Including External Recipients
Hi Leon,
You had earlier asked for the kmemleak output for v1 of this patch.
This is not expected to show up in kmemleak output. The leaked `id_map_entry` is not orphaned from the kernel object graph: after allocation, it remains linked from `sriov->cm_list` and is also indexed by the driver’s id-map data structures. Kmemleak reports allocations that become unreachable from scanned roots. As long as the driver retains a reference to an `id_map_entry` in those live containers, kmemleak will treat the object as reachable, even if the CM protocol lifetime indicates that the entry should have been deleted when REJ was sent.
-
Praveen.
Confidential - Oracle Restricted \Including External Recipients
> -----Original Message-----
> From: Praveen Kumar Kannoju <praveen.kannoju@xxxxxxxxxx>
> Sent: Monday, June 15, 2026 10:48 PM
> To: yishaih@xxxxxxxxxx; jgg@xxxxxxxx; leon@xxxxxxxxxx; linux-
> rdma@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> Cc: Manjunath Patil <manjunath.b.patil@xxxxxxxxxx>; Anand Khoje
> <anand.a.khoje@xxxxxxxxxx>; Praveen Kannoju
> <praveen.kannoju@xxxxxxxxxx>
> Subject: [PATCH v2] IB/mlx4: delete allocated id_map_entry while sending REJ
>
> The mlx4 CM paravirtualization layer rewrites a VF's local communication ID to
> a PF-visible ID when CM MADs are sent from the VF.
> For messages that start or advance a connection from the VF side, such as
> REQ, REP, MRA and SIDR_REQ, mlx4_ib_multiplex_cm_handler() allocates an
> id_map_entry when no existing mapping is found.
>
> A REJ is different because it is a terminal response to an already known
> exchange. It should either find an existing id_map_entry, rewrite the local
> communication ID, and schedule that entry for deletion, or it should pass
> through unchanged when no mapping exists.
>
> Some REJ messages, such as rejects for an inbound REQ before an MRA or REP
> was sent, do not have an id_map_entry because their local_comm_id is zero.
> Timeout REJ messages are handled in the initial lookup branch, but a lookup
> miss there must not fall through to id_map_alloc(); such a miss means there is
> no existing mapping to translate or delete for the REJ.
>
> Commit 227a0e142e37 ("IB/mlx4: Add support for REJ due to timeout")
> added the timeout REJ case to the initial branch so an outgoing timeout REJ
> could reuse the id_map_entry that was created when the VF's REQ was
> multiplexed. Reusing that entry is the useful part: it rewrites the timeout REJ
> local_comm_id to the same PF-visible ID that was sent in the REQ. If the
> lookup misses, allocating a new id_map_entry does not help because the peer
> has never seen that new PF-visible ID, and REJ is not starting a new exchange.
>
> Keep timeout REJ handling in the initial lookup branch, but return before
> allocation if no mapping is found. Handle the other REJ cases with the same
> lookup-only behavior. When a mapping is found, translate the local
> communication ID and schedule delayed deletion, as is already done for DREQ
> and for received REJ in the demux path. When no mapping is found, keep the
> existing pass-through behavior.
>
> Signed-off-by: Praveen Kumar Kannoju <praveen.kannoju@xxxxxxxxxx>
> ---
> drivers/infiniband/hw/mlx4/cm.c | 13 ++++++++++---
> 1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/infiniband/hw/mlx4/cm.c
> b/drivers/infiniband/hw/mlx4/cm.c index 63a868a3822f..202fd5365e35
> 100644
> --- a/drivers/infiniband/hw/mlx4/cm.c
> +++ b/drivers/infiniband/hw/mlx4/cm.c
> @@ -315,14 +315,20 @@ int mlx4_ib_multiplex_cm_handler(struct ib_device
> *ibdev, int port, int slave_id
> id = id_map_get(ibdev, &pv_cm_id, slave_id, sl_cm_id);
> if (id)
> goto cont;
> + if (mad->mad_hdr.attr_id == CM_REJ_ATTR_ID)
> + return 0;
> id = id_map_alloc(ibdev, slave_id, sl_cm_id);
> if (IS_ERR(id)) {
> mlx4_ib_warn(ibdev, "%s: id{slave: %d, sl_cm_id:
> 0x%x} Failed to id_map_alloc\n",
> __func__, slave_id, sl_cm_id);
> return PTR_ERR(id);
> }
> - } else if (mad->mad_hdr.attr_id == CM_REJ_ATTR_ID ||
> - mad->mad_hdr.attr_id == CM_SIDR_REP_ATTR_ID) {
> + } else if (mad->mad_hdr.attr_id == CM_REJ_ATTR_ID) {
> + sl_cm_id = get_local_comm_id(mad);
> + id = id_map_get(ibdev, &pv_cm_id, slave_id, sl_cm_id);
> + if (!id)
> + return 0;
> + } else if (mad->mad_hdr.attr_id == CM_SIDR_REP_ATTR_ID) {
> return 0;
> } else {
> sl_cm_id = get_local_comm_id(mad);
> @@ -338,7 +342,8 @@ int mlx4_ib_multiplex_cm_handler(struct ib_device
> *ibdev, int port, int slave_id
> cont:
> set_local_comm_id(mad, id->pv_cm_id);
>
> - if (mad->mad_hdr.attr_id == CM_DREQ_ATTR_ID)
> + if (mad->mad_hdr.attr_id == CM_DREQ_ATTR_ID ||
> + mad->mad_hdr.attr_id == CM_REJ_ATTR_ID)
> schedule_delayed(ibdev, id);
> return 0;
> }
> --
> 2.43.7