Re: [PATCH] nvme-pci: cancel nvme device request before disabling

From: Tong Zhang
Date: Fri Aug 28 2020 - 08:43:51 EST


Hi Keith,
Thanks for the confirmation. I will send another revision according to
your comments.
Best,
- Tong

On Thu, Aug 27, 2020 at 11:01 AM Keith Busch <kbusch@xxxxxxxxxx> wrote:
>
> On Fri, Aug 14, 2020 at 12:11:56PM -0400, Tong Zhang wrote:
> > On Fri, Aug 14, 2020 at 11:42 AM Keith Busch <kbusch@xxxxxxxxxx> wrote:
> > > > > On Fri, Aug 14, 2020 at 03:14:31AM -0400, Tong Zhang wrote:
> > > > > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > > > > > index ba725ae47305..c4f1ce0ee1e3 100644
> > > > > > --- a/drivers/nvme/host/pci.c
> > > > > > +++ b/drivers/nvme/host/pci.c
> > > > > > @@ -1249,8 +1249,8 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
> > > > > > dev_warn_ratelimited(dev->ctrl.device,
> > > > > > "I/O %d QID %d timeout, disable controller\n",
> > > > > > req->tag, nvmeq->qid);
> > > > > > - nvme_dev_disable(dev, true);
> > > > > > nvme_req(req)->flags |= NVME_REQ_CANCELLED;
> > > > > > + nvme_dev_disable(dev, true);
> > > > > > return BLK_EH_DONE;
> >
> > > anymore. The driver is not reporting non-response back for all
> > > cancelled requests, and that is probably not what we should be doing.
> >
> > OK, thanks for the explanation. I think the bottom line here is to let the
> > probe function know and stop proceeding when there's an error.
> > I also don't see an obvious reason to set NVME_REQ_CANCELLED
> > after nvme_dev_disable(dev, true).
>
> The flag was set after disabling when it didn't happen to matter: the
> block layer had a complicated timeout scheme that didn't actually
> complete the request until the timeout handler returned, so the flag set
> where it is was 'ok'. That's clearly not the case anymore, so yes, I
> think we do need your patch.
>
> There is one case you are missing, though:
>
> ---
> @@ -1267,10 +1267,10 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
> dev_warn(dev->ctrl.device,
> "I/O %d QID %d timeout, reset controller\n",
> req->tag, nvmeq->qid);
> + nvme_req(req)->flags |= NVME_REQ_CANCELLED;
> nvme_dev_disable(dev, false);
> nvme_reset_ctrl(&dev->ctrl);
>
> - nvme_req(req)->flags |= NVME_REQ_CANCELLED;
> return BLK_EH_DONE;
> }
> --