Re: [REGRESSION] nvme: code command_id with a genctr for use-after-free validation crashes apple T2 SSD

From: Aditya Garg
Date: Mon Sep 27 2021 - 00:51:48 EST


I am getting the same error.

________________________________________
From: Orlando Chamberlain <redecorating@xxxxxxxxxxxxxx>
Sent: Monday, September 27, 2021 4:22 AM
To: Sagi Grimberg; Aditya Garg; kbusch@xxxxxxxxxx
Cc: linux-nvme@xxxxxxxxxxxxxxxxxxx; regressions@xxxxxxxxxxxxxxx; hare@xxxxxxx; dwagner@xxxxxxx; hch@xxxxxx
Subject: Re: [REGRESSION] nvme: code command_id with a genctr for use-after-free validation crashes apple T2 SSD

On 26/9/21 18:44, Sagi Grimberg wrote:
>
>> I checked out the proposal sent by Orlando Chamberlain to replace NVME_QUIRK_SHARED_TAGS , by NVME_QUIRK_SHARED_TAGS | given in the patch on http://lists.infradead.org/pipermail/linux-nvme/2021-September/027665.html. The , still causes panics to the T2 as described before. In the case of |, the kernel boots correctly without panicking the T2, but in case we are having Linux on an External Drive, which is my case, then the internal SSD doesn't seem to be recognised at all. I've tested the patch on 5.14.7.
>
> That sounds like a separate issue, because with this patch applied,
> all tags should be within the queue entry range (with generation
> set to 0 always).
>
> Is it possible that the io_queue_depth is being set to something
> that exceeds NVME_PCI_MAX_QUEUE_SIZE (4095) ? the default is 1024
>
I've been able to reproduce it by using the same kernel Aditya is using:
https://github.com/AdityaGarg8/T2-Big-Sur-Ubuntu-Kernel/actions/runs/1275383460

>From the initramfs:

# dmesg | grep nvme
nvme nvme0: pci function 0000:04:00.0
nvme nvme0: 1/0/0 default/read/poll queues
nvme nvme0: Identify NS List failed (status=0xb)
nvme nvme0: LightNVM init failure

It might be because this is 5.14.7, while I've been using 5.15-rc2. Additionally,
there are differences in kernel configs, I've put both configs in this gist
https://gist.github.com/Redecorating/c8cf574df969f9b4f626dfb9c6b2a758