Re: 6.2 nvme-pci: something wrong

From: Keith Busch
Date: Sat Dec 24 2022 - 17:06:54 EST


On Fri, Dec 23, 2022 at 09:24:56PM -0800, Hugh Dickins wrote:
> Hi Christoph,
>
> There's something wrong with the nvme-pci heading for 6.2-rc1:
> no problem booting here on this Lenovo ThinkPad X1 Carbon 5th,
> but under load...
>
> nvme nvme0: I/O 0 (I/O Cmd) QID 2 timeout, aborting
> nvme nvme0: I/O 1 (I/O Cmd) QID 2 timeout, aborting
> nvme nvme0: I/O 2 (I/O Cmd) QID 2 timeout, aborting
> nvme nvme0: I/O 3 (I/O Cmd) QID 2 timeout, aborting
> nvme nvme0: Abort status: 0x0
> nvme nvme0: Abort status: 0x0
> nvme nvme0: Abort status: 0x0
> nvme nvme0: Abort status: 0x0
> nvme nvme0: I/O 0 QID 2 timeout, reset controller
>
> ...and more, until I just have to poweroff and reboot.
>
> Bisection points to your
> 0da7feaa5913 ("nvme-pci: use the tagset alloc/free helpers")
> And that does revert cleanly, giving a kernel which shows no problem.
>
> I've spent a while comparing old nvme_pci_alloc_tag_set() and new
> nvme_alloc_io_tag_set(), I do not know my way around there at all
> and may be talking nonsense, but it did look as if there might now
> be a difference in the queue_depth, sqsize, q_depth conversions.
>
> I'm running load successfully with the patch below, but I strongly
> suspect that the right patch will be somewhere else: over to you!
>
> Hugh
>
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -4926,7 +4926,7 @@ int nvme_alloc_io_tag_set(struct nvme_ct
>
> memset(set, 0, sizeof(*set));
> set->ops = ops;
> - set->queue_depth = ctrl->sqsize + 1;
> + set->queue_depth = ctrl->sqsize;

Your observation is a queue-wrap condition that makes it impossible for
the controller know there are new commands.

Your patch does look like the correct thing to do. The "zero means one"
thing is a confusing distraction, I think. It makes more sense if you
consider sqsize as the maximum number of tags we can have outstanding at
one time and it looks like all the drivers set it that way. We're
supposed to leave one slot empty for a full NVMe queue, so adding one
here to report the total number slots isn't right since that would allow
us to fill all slots.

Fabrics drivers have been using this method for a while, though, so
interesting they haven't had a simiar problem.