Re: Fwd: Lexar NM790 SSDs are not recognized anymore after 6.1.50 LTS

From: Keith Busch
Date: Tue Sep 05 2023 - 12:05:53 EST


On Tue, Sep 05, 2023 at 01:37:36PM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 04.09.23 13:07, Bagas Sanjaya wrote:
> >
> > I notice a regression report on Bugzilla [1]. Quoting from it:
> >
> >> I bought a new 4 TB Lexar NM790 and I was using kernel 6.3.13 at the time. It wasn't recognized, with these messages in dmesg:
> >>
> >> [ 358.950147] nvme nvme0: pci function 0000:06:00.0
> >> [ 358.958327] nvme nvme0: Device not ready; aborting initialisation, CSTS=0x0
> >>
> >> My other NVMe appears correctly in the nvme list though.
> >>
> >>
> >> So I tried using other kernels I had installed at the time: 6.3.7, 6.4.10, 6.5.0rc6, 6.5.0, 6.5.1 and none of these recognized the disk.
> >> I installed the 6.1.50 lts kernel from arch repositories (I can compile my own too if this would be an issue) and then the device was correctly recognized:
> >>
> >> [ 4.654613] nvme 0000:06:00.0: platform quirk: setting simple suspend
> >> [ 4.654632] nvme nvme0: pci function 0000:06:00.0
> >> [ 4.667290] nvme nvme0: allocated 40 MiB host memory buffer.
> >> [ 4.709473] nvme nvme0: 16/0/0 default/read/poll queues
>
> FWIW, the quoted mail missed one crucial detail:
> """
> Claudio Sampaio 2023-09-02 19:04:29 UTC
>
> Adding the two lines
>
> │ 3457 { PCI_DEVICE(0x1d97, 0x1602), /* Lexar NM790 */
> │ 3458 │ .driver_data = NVME_QUIRK_BOGUS_NID, },
>
> in file drivers/nvme/host/pci.c made my NVMe work correctly. Compiled a
> new 6.5.1 kernel and everything works.
> """
>
> @NVME maintainers: is there anything more you need from Claudio at this
> point?

Yes: it doesn't really make any sense. The report says the device
stopped showing up with message:

nvme nvme0: Device not ready; aborting initialisation, CSTS=0x0

That (a) happens long before the mentioned quirk is considered by the
driver, and (b) the "quirk" behavior is now the default in 6.5 and
several of the listed stable kernels anyway.

It more likely sounds like the device is flaky and either never becomes
ready due to some unspecified internal firmware condition, or
inaccurately reports how long it actually needs to become ready in
worst-case-scenario.