Re: [PATCH rdma-next 0/6] RDMA: Fix restrack UAF in QP/CQ/SRQ destroy
From: Jason Gunthorpe
Date: Thu Jun 11 2026 - 15:11:27 EST
On Sun, Jun 07, 2026 at 09:18:07PM +0300, Edward Srouji wrote:
> The resource-tracking (restrack) database is the back-end for the netlink
> "rdma resource show" interface which pins objects with
> rdma_restrack_get().
> The QP/CQ/SRQ destroy flows call rdma_restrack_del() at the end of
> ib_destroy_*_user(), after device->ops.destroy_*() had already freed the
> vendor object. Therefore, a concurrent netlink dump could look the
> object up and touch freed memory, causing a use-after-free via
> ib_query_qp() for instance.
>
> Fix this by splitting the delete into a begin/commit/abort sequence:
> begin_del() parks the entry as XA_ZERO_ENTRY (so lookups return NULL),
> drops the birth reference and waits for in-flight readers to drain,
> while keeping the index reserved. The destroy paths run begin_del()
> first, then commit_del() on success or abort_del() on error.
> abort_del() re-inserts into the reserved slot, so it needs no allocation
> and cannot fail.
>
> The first two patches remove DCT and raw RSS QP restrack tracking as
> they have never worked (their ID is unset/reserved at create time).
>
> Signed-off-by: Edward Srouji <edwards@xxxxxxxxxx>
> ---
> Patrisious Haddad (6):
> RDMA/mlx5: Remove DCT restrack tracking
> RDMA/mlx5: Remove raw RSS QP restrack tracking
> RDMA/core: Add rdma_restrack_begin/abort/commit_del() operations
> RDMA/core: Fix use after free in ib_query_qp()
> RDMA/core: Fix potential use after free in ib_destroy_cq_user()
> RDMA/core: Fix potential use after free in ib_destroy_srq_user()
The pre-existing sashiko issues look real too, can you fix them also:
https://sashiko.dev/#/patchset/20260607-restrack-uaf-fix-v1-0-d72e45eb76c2%40nvidia.com
The sashiko notes about XA_ZERO_ENTRY seems to be really obviously
wrong:
void *__xa_cmpxchg(struct xarray *xa, unsigned long index,
void *old, void *entry, gfp_t gfp)
{
return xa_zero_to_null(__xa_cmpxchg_raw(xa, index, old, entry, gfp));
}
EXPORT_SYMBOL(__xa_cmpxchg);
This looks legit:
For instance, in drivers/infiniband/core/cq.c:ib_free_cq():
ret = cq->device->ops.destroy_cq(cq, NULL);
WARN_ONCE(ret, "Destroy of kernel CQ shouldn't fail");
rdma_restrack_del(&cq->res);
and so on
Please send a series switching more/all places to commit/abort,
probably there should be very few/no calls to a naked del left.
This doesn't apply on top of the restrack_sync addition, please rebase
it.
You should probably be refactoring rdma_restrack_sync() and using its
parts in this implementation since it does the same things.
I don't think this should NULL the task on abort either, it doesn't
seem necessary.
Jason