RE: [PATCH] IB/mlx4: delete allocated id_map_entry while sending REJ

From: Praveen Kannoju

Date: Mon Jun 08 2026 - 08:29:51 EST


Confidential - Oracle Restricted \Including External Recipients

Thank you, Jason.

Yes, ideally the REJ path should not be allocating memory at all. I plan to fix the allocation in the earlier stanza as well.

I have not yet captured a kmemleak report, which Leon also requested. I'll gather that data, update the patch accordingly, and send a revised version with a more precise commit message that better explains the issue and the object lifetime.

Regarding SIDR, I would prefer to handle it separately once I have a clearer understanding of the expected flow there.

-
Praveen.


Confidential - Oracle Restricted \Including External Recipients
> -----Original Message-----
> From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> Sent: Wednesday, June 3, 2026 12:37 AM
> To: Praveen Kannoju <praveen.kannoju@xxxxxxxxxx>
> Cc: yishaih@xxxxxxxxxx; leon@xxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx; Anand Khoje <anand.a.khoje@xxxxxxxxxx>;
> Manjunath Patil <manjunath.b.patil@xxxxxxxxxx>
> Subject: Re: [PATCH] IB/mlx4: delete allocated id_map_entry while sending REJ
>
> On Wed, May 06, 2026 at 09:08:24AM +0000, Praveen Kumar Kannoju
> wrote:
> > During scenarios where a REJ is sent after a REQ or REP, the allocated
> > is_map_entry remains in memory, resulting in a memory leak. Scheduling
> > the entry for deletion during REJ handling, if it is not NULL,
> > resolves the issue.
>
> Well, the leak seems quite likely, but I'm not sure about this fix.
>
> This code looks quite odd and it seems to have other races as well, so IDK..
>
> > Signed-off-by: Praveen Kumar Kannoju <praveen.kannoju@xxxxxxxxxx>
> > ---
> > drivers/infiniband/hw/mlx4/cm.c | 8 ++++----
> > 1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/infiniband/hw/mlx4/cm.c
> > b/drivers/infiniband/hw/mlx4/cm.c index 63a868a3822f..21f2f401ed61
> > 100644
> > --- a/drivers/infiniband/hw/mlx4/cm.c
> > +++ b/drivers/infiniband/hw/mlx4/cm.c
> > @@ -321,10 +321,9 @@ int mlx4_ib_multiplex_cm_handler(struct
> ib_device *ibdev, int port, int slave_id
> > __func__, slave_id, sl_cm_id);
> > return PTR_ERR(id);
> > }
> > - } else if (mad->mad_hdr.attr_id == CM_REJ_ATTR_ID ||
> > - mad->mad_hdr.attr_id == CM_SIDR_REP_ATTR_ID) {
> > + } else if (mad->mad_hdr.attr_id == CM_SIDR_REP_ATTR_ID)
> > return 0;
> > - } else {
> > + else {
> > sl_cm_id = get_local_comm_id(mad);
> > id = id_map_get(ibdev, &pv_cm_id, slave_id, sl_cm_id);
> > }
>
> What is this change for?
>
> It does look like ignoring the rej isn't right, but then also why does this rej just
> search and free but the rej in the prior stanza is allocating too?
>
> > @@ -338,7 +337,8 @@ int mlx4_ib_multiplex_cm_handler(struct ib_device
> > *ibdev, int port, int slave_id
> > cont:
> > set_local_comm_id(mad, id->pv_cm_id);
> >
> > - if (mad->mad_hdr.attr_id == CM_DREQ_ATTR_ID)
> > + if (mad->mad_hdr.attr_id == CM_DREQ_ATTR_ID ||
> > + mad->mad_hdr.attr_id == CM_REJ_ATTR_ID)
> > schedule_delayed(ibdev, id);
> > return 0;
> > }
>
> SIDR seems troubled as well.
>
> AI pointed out the use of id like this is racey too.
>
> But broadly this seems like it might be the right direction, but the commit
> message should explain what this logic is alot better
>
> Jason