Re: [PATCH rdma-next 0/6] RDMA: Fix restrack UAF in QP/CQ/SRQ destroy
From: Jason Gunthorpe
Date: Fri Jun 12 2026 - 07:54:36 EST
On Fri, Jun 12, 2026 at 11:53:23AM +0300, Patrisious Haddad wrote:
>
> On 6/11/2026 10:11 PM, Jason Gunthorpe wrote:
> > On Sun, Jun 07, 2026 at 09:18:07PM +0300, Edward Srouji wrote:
> > > The resource-tracking (restrack) database is the back-end for the netlink
> > > "rdma resource show" interface which pins objects with
> > > rdma_restrack_get().
> > > The QP/CQ/SRQ destroy flows call rdma_restrack_del() at the end of
> > > ib_destroy_*_user(), after device->ops.destroy_*() had already freed the
> > > vendor object. Therefore, a concurrent netlink dump could look the
> > > object up and touch freed memory, causing a use-after-free via
> > > ib_query_qp() for instance.
> > >
> > > Fix this by splitting the delete into a begin/commit/abort sequence:
> > > begin_del() parks the entry as XA_ZERO_ENTRY (so lookups return NULL),
> > > drops the birth reference and waits for in-flight readers to drain,
> > > while keeping the index reserved. The destroy paths run begin_del()
> > > first, then commit_del() on success or abort_del() on error.
> > > abort_del() re-inserts into the reserved slot, so it needs no allocation
> > > and cannot fail.
> > >
> > > The first two patches remove DCT and raw RSS QP restrack tracking as
> > > they have never worked (their ID is unset/reserved at create time).
> > >
> > > Signed-off-by: Edward Srouji <edwards@xxxxxxxxxx>
> > > ---
> > > Patrisious Haddad (6):
> > > RDMA/mlx5: Remove DCT restrack tracking
> > > RDMA/mlx5: Remove raw RSS QP restrack tracking
> > > RDMA/core: Add rdma_restrack_begin/abort/commit_del() operations
> > > RDMA/core: Fix use after free in ib_query_qp()
> > > RDMA/core: Fix potential use after free in ib_destroy_cq_user()
> > > RDMA/core: Fix potential use after free in ib_destroy_srq_user()
> > The pre-existing sashiko issues look real too, can you fix them also:
> Sure but one of them is a false-positive though:
> Before destroy_qp() is called, the counter is unconditionally unbound:
> rdma_counter_unbind_qp(qp, qp->port, true);
> ret = qp->device->ops.destroy_qp(qp, udata);
> If destroy_qp() fails and we abort destruction here, the kref on the
> counter was dropped in rdma_counter_unbind_qp(), but qp->counter is never
> set to NULL.
>
> This is actually wrong the qp->counter is actually set to NULL(inside the
> driver though not the core) so subsequent calls will hit the NULL check and
> return safely.
That doesn't sound very good why is it like that, why is a driver
making any permanent change on destroy failure?
> I wonder what about places where switching to commit/abort doesnt fix an
> actual bug.
I would change them all.
> and ib_dereg_mr_user() actually calls the delete at start so it doesnt have
> this issue (but it also doesnt readd it to restrack when driver OP fails) -
> but here I think thats by design since the MR would be in weird state and we
> dont want to track it ?
That doesn't sound right either.
> > I don't think this should NULL the task on abort either, it doesn't
> > seem necessary.
> I dont NULL the task on abort(I do NULL it on commit_del() though , or were
> you talking about the restrack_sync() ?
Since begin_del calls rdma_restrack_put() which calls through
rdma_restrack_del it NULL's it.
Jason