Re: [PATCH] nvme: fix deadlock between reset and scan
From: Keith Busch
Date: Tue Nov 28 2023 - 13:00:42 EST
On Tue, Nov 28, 2023 at 12:13:59PM +0200, Sagi Grimberg wrote:
>
>
> On 11/28/23 08:22, yaoma wrote:
> > Hi Keith Busch
> >
> > Thanks for your reply.
> >
> > The idea to avoid such a deadlock between nvme_reset and nvme_scan is to
> > ensure that no namespace can be added to ctrl->namespaces after
> > nvme_start_freeze has already been called. We can achieve this goal by
> > assessing the ctrl->state after we have already acquired the
> > ctrl->namespaces_rwsem lock, to decide whether to add the namespace to
> > the list or not.
> > 1. After we determine that ctrl->state is LIVE, it may be immediately
> > changed to another state. However, since we have already acquired the
> > lock, other tasks cannot access ctrl->namespace, so we can still safely
> > add the namespace to the list. After acquiring the lock,
> > nvme_start_freeze will freeze all ns->q in the list, including any newly
> > added namespaces.
> > 2. Before the completion of nvme_reset, ctrl->state will not be changed
> > to LIVE, so we will not add any more namespaces to the list. All ns->q
> > in the list is frozen, so nvme_wait_freeze can exit normally.
>
> I agree with the analysis, there is a hole between start_freeze and
> freeze_wait that a scan may add a ns to the ctrl ns list.
>
> However the fix should be to mark the ctrl with say NVME_CTRL_FROZEN
> flag set in nvme_freeze_start and cleared in nvme_unfreeze (similar
> to what we did with quiesce). Then the scan can check it before adding
> the new namespace (under the namespaces_rwsem).
Could we just make sure that scan_work isn't running? If we reset a live
controller, then we're not depending on reset_work to unblock scan_work,
and can let scan_work end gracefully. The scan_work can't be rescheduled
again while in the resetting state.
---
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index fad4cccce745c..5d6305475bad5 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2701,8 +2701,10 @@ static void nvme_reset_work(struct work_struct *work)
* If we're called to reset a live controller first shut it down before
* moving on.
*/
- if (dev->ctrl.ctrl_config & NVME_CC_ENABLE)
+ if (dev->ctrl.ctrl_config & NVME_CC_ENABLE) {
+ flush_work(&dev->ctrl.scan_work);
nvme_dev_disable(dev, false);
+ }
nvme_sync_queues(&dev->ctrl);
mutex_lock(&dev->shutdown_lock);
--