Re: [PATCH rdma-next v2 1/1] RDMA/mana: Provide a modern CQ creation interface

From: Konstantin Taranov

Date: Thu Mar 05 2026 - 04:23:19 EST


> On Wed, Mar 04, 2026 at 02:06:21PM +0000, Konstantin Taranov wrote:
> > > > > > The uverbs CQ creation UAPI allows users to supply their own
> > > > > > umem for a
> > > > > CQ.
> > > > > > Update mana to support this workflow while preserving support
> > > > > > for creating umem through the legacy interface.
> > > > > >
> > > > > > To support RDMA objects that own umem, extend
> > > > > mana_ib_create_queue()
> > > > > > to return the umem to the caller and do not allocate umem if
> > > > > > it was allocted by the caller.
> > > > > >
> > > > > > Signed-off-by: Konstantin Taranov <kotaranov@xxxxxxxxxxxxx>
> > > > > > ---
> > > > > > v2: It is a rework of the patch proposed by Leon
> > > > >
> > > > > I am curious to know what changes were introduced?
> > > >
> > > > It is like your patch, but I kept get_umem in mana_ib_create_queue
> > > > and introduced ownership.
> > > > It made the code simpler and extendable. In your proposal, it was
> > > > hard to track the changes and it led to double free of the umem.
> > > > With new
> > > > mana_ib_create_queue() it is clear from the caller what happens
> > > > and no special changes in the caller required.
> > > >
> > > > >
> > > > > > drivers/infiniband/hw/mana/cq.c | 125 +++++++++++++++++------
> ----
> > > > > > drivers/infiniband/hw/mana/device.c | 1 +
> > > > > > drivers/infiniband/hw/mana/main.c | 30 +++++--
> > > > > > drivers/infiniband/hw/mana/mana_ib.h | 5 +-
> > > > > > drivers/infiniband/hw/mana/qp.c | 5 +-
> > > > > > drivers/infiniband/hw/mana/wq.c | 3 +-
> > > > > > 6 files changed, 111 insertions(+), 58 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/infiniband/hw/mana/cq.c
> > > > > > b/drivers/infiniband/hw/mana/cq.c index b2749f971..fa951732a
> > > > > > 100644
> > > > > > --- a/drivers/infiniband/hw/mana/cq.c
> > > > > > +++ b/drivers/infiniband/hw/mana/cq.c
> > > > > > @@ -8,12 +8,8 @@
> > > > > > int mana_ib_create_cq(struct ib_cq *ibcq, const struct
> > > > > > ib_cq_init_attr
> > > > > *attr,
> > > > > > struct uverbs_attr_bundle *attrs) {
> > > > > > - struct ib_udata *udata = &attrs->driver_udata;
> > > > > > struct mana_ib_cq *cq = container_of(ibcq, struct
> mana_ib_cq, ibcq);
> > > > > > - struct mana_ib_create_cq_resp resp = {};
> > > > > > - struct mana_ib_ucontext *mana_ucontext;
> > > > > > struct ib_device *ibdev = ibcq->device;
> > > > > > - struct mana_ib_create_cq ucmd = {};
> > > > > > struct mana_ib_dev *mdev;
> > > > > > bool is_rnic_cq;
> > > > > > u32 doorbell;
> > > > > > @@ -26,48 +22,91 @@ int mana_ib_create_cq(struct ib_cq *ibcq,
> > > > > > const
> > > > > struct ib_cq_init_attr *attr,
> > > > > > cq->cq_handle = INVALID_MANA_HANDLE;
> > > > > > is_rnic_cq = mana_ib_is_rnic(mdev);
> > > > > >
> > > > > > - if (udata) {
> > > > > > - if (udata->inlen < offsetof(struct mana_ib_create_cq,
> flags))
> > > > > > - return -EINVAL;
> > > > > > -
> > > > > > - err = ib_copy_from_udata(&ucmd, udata,
> min(sizeof(ucmd),
> > > > > udata->inlen));
> > > > > > - if (err) {
> > > > > > - ibdev_dbg(ibdev, "Failed to copy from udata
> for
> > > > > create cq, %d\n", err);
> > > > > > - return err;
> > > > > > - }
> > > > > > + if (attr->cqe > U32_MAX / COMP_ENTRY_SIZE / 2 + 1)
> > > > > > + return -EINVAL;
> > > > >
> > > > > We are talking about kernel verbs. ULPs are not designed to
> > > > > provide attributes and recover from random driver limitations.
> > > >
> > > > I understand, but there was an ask before to add that check as
> > > > some automated code verifier detected overflow. So if we remote
> > > > it, I guess we get again an ask to fix the potential overflow.
> > > >
> > > > >
> > > > > >
> > > > > > - if ((!is_rnic_cq && attr->cqe > mdev-
> > > > > >adapter_caps.max_qp_wr) ||
> > > > > > - attr->cqe > U32_MAX / COMP_ENTRY_SIZE) {
> > > > > > - ibdev_dbg(ibdev, "CQE %d exceeding
> limit\n", attr-
> > > > > >cqe);
> > > > > > - return -EINVAL;
> > > > > > - }
> > > > > > + buf_size = MANA_PAGE_ALIGN(roundup_pow_of_two(attr-
> >cqe *
> > > > > COMP_ENTRY_SIZE));
> > > > > > + cq->cqe = buf_size / COMP_ENTRY_SIZE;
> > > > > > + err = mana_ib_create_kernel_queue(mdev, buf_size,
> GDMA_CQ,
> > > > > &cq->queue);
> > > > > > + if (err) {
> > > > > > + ibdev_dbg(ibdev, "Failed to create kernel queue for
> create
> > > > > > +cq,
> > > > > %d\n", err);
> > > > > > + return err;
> > > > > > + }
> > > > > > + doorbell = mdev->gdma_dev->doorbell;
> > > > > >
> > > > > > - cq->cqe = attr->cqe;
> > > > > > - err = mana_ib_create_queue(mdev, ucmd.buf_addr,
> cq->cqe
> > > > > * COMP_ENTRY_SIZE,
> > > > > > - &cq->queue);
> > > > > > + if (is_rnic_cq) {
> > > > > > + err = mana_ib_gd_create_cq(mdev, cq, doorbell);
> > > > > > if (err) {
> > > > > > - ibdev_dbg(ibdev, "Failed to create queue for
> create
> > > > > cq, %d\n", err);
> > > > > > - return err;
> > > > > > + ibdev_dbg(ibdev, "Failed to create RNIC cq,
> %d\n",
> > > > > err);
> > > > > > + goto err_destroy_queue;
> > > > > > }
> > > > > >
> > > > > > - mana_ucontext = rdma_udata_to_drv_context(udata,
> struct
> > > > > mana_ib_ucontext,
> > > > > > - ibucontext);
> > > > > > - doorbell = mana_ucontext->doorbell;
> > > > > > - } else {
> > > > > > - if (attr->cqe > U32_MAX / COMP_ENTRY_SIZE / 2 + 1)
> {
> > > > > > - ibdev_dbg(ibdev, "CQE %d exceeding
> limit\n", attr-
> > > > > >cqe);
> > > > > > - return -EINVAL;
> > > > > > - }
> > > > > > - buf_size =
> MANA_PAGE_ALIGN(roundup_pow_of_two(attr-
> > > > > >cqe * COMP_ENTRY_SIZE));
> > > > > > - cq->cqe = buf_size / COMP_ENTRY_SIZE;
> > > > > > - err = mana_ib_create_kernel_queue(mdev, buf_size,
> > > > > GDMA_CQ, &cq->queue);
> > > > > > + err = mana_ib_install_cq_cb(mdev, cq);
> > > > > > if (err) {
> > > > > > - ibdev_dbg(ibdev, "Failed to create kernel
> queue for
> > > > > create cq, %d\n", err);
> > > > > > - return err;
> > > > > > + ibdev_dbg(ibdev, "Failed to install cq callback,
> %d\n",
> > > > > err);
> > > > > > + goto err_destroy_rnic_cq;
> > > > > > }
> > > > > > - doorbell = mdev->gdma_dev->doorbell;
> > > > > > }
> > > > > >
> > > > > > + spin_lock_init(&cq->cq_lock);
> > > > > > + INIT_LIST_HEAD(&cq->list_send_qp);
> > > > > > + INIT_LIST_HEAD(&cq->list_recv_qp);
> > > > > > +
> > > > > > + return 0;
> > > > > > +
> > > > > > +err_destroy_rnic_cq:
> > > > > > + mana_ib_gd_destroy_cq(mdev, cq);
> > > > > > +err_destroy_queue:
> > > > > > + mana_ib_destroy_queue(mdev, &cq->queue);
> > > > > > +
> > > > > > + return err;
> > > > > > +}
> > > > > > +
> > > > > > +int mana_ib_create_user_cq(struct ib_cq *ibcq, const struct
> > > > > ib_cq_init_attr *attr,
> > > > > > + struct uverbs_attr_bundle *attrs) {
> > > > > > + struct mana_ib_cq *cq = container_of(ibcq, struct
> mana_ib_cq, ibcq);
> > > > > > + struct ib_udata *udata = &attrs->driver_udata;
> > > > > > + struct mana_ib_create_cq_resp resp = {};
> > > > > > + struct mana_ib_ucontext *mana_ucontext;
> > > > > > + struct ib_device *ibdev = ibcq->device;
> > > > > > + struct mana_ib_create_cq ucmd = {};
> > > > > > + struct mana_ib_dev *mdev;
> > > > > > + bool is_rnic_cq;
> > > > > > + u32 doorbell;
> > > > > > + int err;
> > > > > > +
> > > > > > + mdev = container_of(ibdev, struct mana_ib_dev, ib_dev);
> > > > > > +
> > > > > > + cq->comp_vector = attr->comp_vector % ibdev-
> >num_comp_vectors;
> > > > > > + cq->cq_handle = INVALID_MANA_HANDLE;
> > > > > > + is_rnic_cq = mana_ib_is_rnic(mdev);
> > > > > > +
> > > > > > + if (udata->inlen < offsetof(struct mana_ib_create_cq, flags))
> > > > > > + return -EINVAL;
> > > > > > +
> > > > > > + err = ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd),
> > > > > > +udata-
> > > > > >inlen));
> > > > > > + if (err) {
> > > > > > + ibdev_dbg(ibdev, "Failed to copy from udata for
> create cq,
> > > > > %d\n", err);
> > > > > > + return err;
> > > > > > + }
> > > > > > +
> > > > > > + if ((!is_rnic_cq && attr->cqe > mdev-
> >adapter_caps.max_qp_wr) ||
> > > > > > + attr->cqe > U32_MAX / COMP_ENTRY_SIZE)
> > > > > > + return -EINVAL;
> > > > > > +
> > > > > > + cq->cqe = attr->cqe;
> > > > > > + err = mana_ib_create_queue(mdev, ucmd.buf_addr, cq->cqe
> *
> > > > > COMP_ENTRY_SIZE,
> > > > > > + &cq->queue, &ibcq->umem);
> > >
> > > I just realized that I forgot to handle the case when ibcq->umem ==
> > > NULL and mana fails later after this call. I need to clean ibcq->umem in
> this case.
> > > I will address that in v3. I am sorry.
> > >
> >
> > Hi Leon,
> > After re-reading the code, I see that there is no bug in v2 as the
> > umem gets deallocated on failure inside the handler of
> > UVERBS_METHOD_CQ_CREATE. I also see that you also had the same logic
> > in v1. So, what is your recommendation? Leave v2 logic as is, so mana
> > would immediately give ownership of umem to cq->umem, and if
> > mana_ib_create_user_cq() fails at later stage it should not clean cq->umem
> and leave it to the caller handle (i.e., UVERBS_METHOD_CQ_CREATE) to clean
> cq->umem regardless of who created it.
> >
> > Or should I make v3, where I will assign umem to cq->umem right before
> > return 0, so that if
> > mana_ib_create_user_cq() fails it does not change cq->umem at all.
>
> My suggestion is to stick with my original patch and remove
> ib_umem_release(queue->umem) from mana_ib_destroy_queue().

Unfortunately, this will break the code and will require more boilerplate workarounds.
MANA still needs to allocate umem for many objects. I see that we can generalize for CQs,
but mana RC QPs use 4 queues at the moment, and I am preparing a patch to have 5th queue
for BIND WQEs for RC. Our UC QP will use 3 queues. Having a nice entity as mana queue allows us to
have clean code and do not have extra complex conditions to detect whether a mana queue
has an umem and the cleanup of mana queue handles that.

My proposal is to keep the nice property of mana queues and just extend the mana_ib_create_queue()
to satisfy your requirements without adding burden of special handling umems. As I understand the new
API requirement is that umem for a CQ should be owned by the ib core, and that is what the helper
achieves: the umem pointer is not stored and assigned to cq->umem directly. The only open
question I have is what the requirement for writing an umem to cq->umem is. In your patch, I see that
you mutate cq->umem even if mana_ib_create_user_cq() fails. Is it the behavior you need/want/allow and
will it be enforced in other IB objects that have one umem (e.g., WQs, SRQs)?
As it seems to be allowed in the UVERBS_METHOD_CQ_CREATE code:
if (ib_dev->ops.create_user_cq)
ret = ib_dev->ops.create_user_cq(cq, &attr, attrs);
else
ret = ib_dev->ops.create_cq(cq, &attr, attrs);
if (ret)
goto err_free;
...
err_free:
ib_umem_release(cq->umem);
If it is not expected, you might enforce it by adding WARN_ON(cq->umem != umem); before ib_umem_release()
And I am happy to adjust mana code to satisfy this behavior.

here I am just trying to get a win-win situation where we both can be happy about the code. I looked at
your proposal and removing ib_umem_release from mana destroy queue will just add it after mana_ib_destroy_queue()
in all code paths except the CQ. As well as we would need to add code to handle failures of ib_umem_get, so keeping it
in the helper removes the need to have that. Instead, I would like to make the helper to handle the cases when umem
is created by upper stack or is not created but want to be owned by the upper stack. What is more, I would like the helper
be general enough to be used for other ib core objects and that is why I would like to know the model so I adjust the helper accordingly.

Thanks
Konstantin

>
> Thanks
>
> >
> > > - Konstantin
> > >
> >
> >