Re: Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD)

From: John Meneghini
Date: Tue Jul 11 2023 - 18:08:39 EST


Yes, this is what I thought. This is all the result of the duplicate NID check added to deal with TP4034 Dispersed Namespaces.

One suggestion I have would be to limit this check to nvme-of subsystems only. These are the only devices I am aware of out there which support TP4034. Moreover, all nvme-of devices report a valid NID. It's required with NVMe Over Fabrics. The PCIe devices, I expect, don't care. You don't really need a valid NID with a private namespace - which is what most PCIe devices are.

I'll wager that if you change nvme_global_check_duplicate_ids() to check only nvme-of subsystems, and simply continue with PCIe subsystems, 90% of these nvme quirks can be removed.

John Meneghini
Senior Principal Platform Storage Engineer
RHEL SST - Platform Storage Group
jmeneghi@xxxxxxxxxx

On 7/11/23 13:21, Keith Busch wrote:
On Tue, Jul 11, 2023 at 09:47:00AM -0700, Linus Torvalds wrote:
On Tue, 11 Jul 2023 at 05:06, Christoph Hellwig <hch@xxxxxx> wrote:
For example, we have this completely unacceptable garbage:

ret = nvme_global_check_duplicate_ids(ctrl->subsys, &info->ids);
if (ret) {
dev_err(ctrl->device,
"globally duplicate IDs for nsid %d\n", info->nsid);
nvme_print_device_info(ctrl);
return ret;
}

iow, the code even checks for and *notices* that there are duplicate
IDs, and what does it do? It then errors out.

This check came from a recent half-baked spec feature called "Dispersed
Namespaces" that caused breakage and data corruption when used in Linux.
Rather than attempt to support that mostly vendor specific feature, the
driver attempted to fence that off as unmaintainable. This check wasn't
aimed at enforcing "correctness", but it certainly found a lot of that
as collatoral damage. Let's see if we can find a better way to detect
the difference with a sane fallback as you suggest.