Re: [PATCH] nvme-rdma: avoid stale device wrapper in remove_one
From: Nilay Shroff
Date: Mon Jun 22 2026 - 08:45:59 EST
On 6/21/26 7:29 PM, Cen Zhang wrote:
nvme_rdma_remove_one() walks nvme_rdma_ctrl_list under
nvme_rdma_ctrl_mutex, but it identifies matching controllers by reading
ctrl->device->dev. The mutex only protects controller list membership.
ctrl->device is a cached copy of queue 0's nvme_rdma_device, and that
wrapper is refcounted by queue lifetime.
The buggy scenario involves two paths, with each column showing the order
within that path:
RDMA remove callback: Controller error recovery:
1. enter nvme_rdma_remove_one() 1. run nvme_rdma_error_recovery_work()
2. walk nvme_rdma_ctrl_list 2. tear down the admin queue
3. read ctrl->device->dev 3. drop the final queue device ref
4. free the nvme_rdma_device wrapper
Fix this by caching the ib_device identity in the controller when the
admin queue is configured. The remove callback can then compare the
cached pointer value against the ib_device being removed without
dereferencing the queue-owned nvme_rdma_device wrapper. Keep the delete
workqueue flush conditional on actually matching a controller.
I think, instead of caching ib_device in struct nvme_rdma_ctrl, we can take a
reference on the matching nvme_rdma_device (or ndev) while walking device_list.
As the reference is taken, the rdma device would not be freed until we put the
reference or while we loops through the ctrl list.
Then, loop through the ctrl list and compare ctrl->device directly against that
matching ndev. That would avoid dereferencing ctrl->device->dev while reusing
the existing kref-based lifetime management for nvme_rdma_device, rather than
introducing a separate cached ib_device pointer. At the end, we can drop the
reference to ndev.
Thanks,
--Nilay