Re: [PATCH] rdma: infiniband: Added __alloc_cq request value Return value non-zero value determination

From: luoqing

Date: Thu May 28 2026 - 02:55:25 EST


On Tue, May 26, 2026 at 09:23:29AM -0300, Jason Gunthorpe wrote:
> On Tue, May 26, 2026 at 05:18:16PM +0800, luoqing wrote:
> > From: luoqing <luoqing@xxxxxxxxxx>
> >
> > Currently, when __alloc_cq allocates memory for an InfiniBand Completion Queue (ib_cq) object,
> > it uses memory allocation functions that may not guarantee zero-initialization under certain error paths or memory pressure conditions.
> > If the allocated ib_cq object contains non-zero garbage data due to incomplete initialization,
> > the function may return a non-NULL pointer even though the object is not in a valid state. This can lead to undefined behavior,
> > memory corruption, and potential kernel crashes when the driver subsequently accesses uninitialized fields.
> >
> > This patch adds explicit validation to ensure that the allocated ib_cq object is properly zeroed before being considered valid.
> > If the object fails the zero-check (i.e., contains non-zero bytes beyond expected initialized fields),
> > the function returns an error code (e.g., -ENOMEM or -EINVAL), logs a warning message, and prevents further usage of the corrupted CQ.
> >
> > Signed-off-by: luoqing <luoqing@xxxxxxxxxx>
> > ---
> > drivers/infiniband/core/cq.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
> > index 3d7b6cddd131..756bc33c850d 100644
> > --- a/drivers/infiniband/core/cq.c
> > +++ b/drivers/infiniband/core/cq.c
> > @@ -224,7 +224,7 @@ struct ib_cq *__ib_alloc_cq(struct ib_device *dev, void *private, int nr_cqe,
> > return ERR_PTR(-EINVAL);
> >
> > cq = rdma_zalloc_drv_obj(dev, ib_cq);
> > - if (!cq)
> > + if (unlikely(ZERO_OR_NULL_PTR(cq)))
> > return ERR_PTR(ret);
>
> Wow, this entire report is unintelligible.
>
> ZERO_OR_NULL_PTR() has nothing to do with the memory contents.
>
> Jason

Hi Jason,

Thank you for your quick response, and sorry for the confusion in my previous explanation.
Let me try to restate the issue more clearly.

In __ib_alloc_cq(), we allocate an ib_cq object using rdma_zalloc_drv_obj(), which is supposed to return zero-initialized memory.
However, when rdma_zalloc_drv_obj() returns ZERO_SIZE_PTR ((void *)16), the current code only checks !cq and treats it as a successful allocation (non-NULL).
This happens when the allocation size is zero — a condition that might not be properly validated in some driver registration paths.

If a driver inadvertently registers with an incomplete or zero-sized object requirement, cq becomes ZERO_SIZE_PTR, not NULL.
Later, when the kernel tries to use this CQ (e.g., initializing fields), it may access invalid memory, leading to a kernel crash or memory corruption.

Although this is fundamentally a driver registration issue (drivers should specify correct sizes), adding an extra defensive check in __ib_alloc_cq() — like ZERO_OR_NULL_PTR(cq) — would:

Prevent crashes caused by incomplete driver initialization

Add no meaningful overhead

Improve kernel robustness, especially for out-of-tree or legacy drivers

I understand that ZERO_OR_NULL_PTR is not about memory contents, but about the special zero-size pointer case.
In this context, it acts as a safeguard against a specific class of programming error.

Would you accept a patch that replaces !cq with ZERO_OR_NULL_PTR(cq) (or an explicit if (IS_ERR_OR_NULL(cq))) to cover this corner case?

Thanks for your patience and guidance.

Best regards,

luoqing