[Regression] File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX
From: Thorsten Leemhuis
Date: Wed Jan 08 2025 - 09:40:46 EST
[side note TWIMC: regression tracking is sadly kinda dormant temporarily
(hopefully this will change again soon), but this was brought to my
attention and looked kinda important]
Hi, Thorsten here, the Linux kernel's regression tracker.
Adrian, Christoph I noticed a report about a regression in
bugzilla.kernel.org that appears to be caused by a change you too
handled a while ago -- or it exposed an earlier problem:
3710e2b056cb92 ("nvme-pci: clamp max_hw_sectors based on DMA optimized
limitation") [v6.4-rc3]
As many (most?) kernel developers don't keep an eye on the bug tracker,
I decided to write this mail. To quote from
https://bugzilla.kernel.org/show_bug.cgi?id=219609 :
> Bug 219609 - File corruptions on SSD in 1st M.2 socket of AsRock X600M-STX + Ryzen 8700G
>
> there are one or two bugs which were originally reported at
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1076372 . For details
> (logs, etc.), see there. Here, I will post a summary and try to point
> out the most relevant observations:
>
> Bug 1: Write errors with Lexar NM790 NVME
>
> * Occur since Debian kernel 6.5, but reproduced with upstream kernel
> 6.11.5 (the only upstream kernel I tested)
> * Only occur in 1st M.2 socket (not in the 2nd one on rear side)
> * Easiest way to reproduce them is to use f3 (
> https://fight-flash-fraud.readthedocs.io/en/latest/usage.html ). f3
> reports overwritten sectors
> * The errors seem not to occur in the last files of 500 file (=500 GB)
> test runs and I never detected file system corruption (just defect
> files; I produced probably more than thousand ones). The reason for the
> latter observation is maybe, that file system information are written
> last. (See see message 113 in the Debian bug report)
>
> (Possible) Bug 2: Read errors with Kingston FURY Renegade
>
> * Only occur in 1st M.2 socket (did not tested the rear socket, because
> the warranty seal would to be broken in order to remove the heat sink)
> * Almost impossible to reproduce it, only detected it in Debian kernel
> that bases on 6.1.112
> * 1st occurrence: I detected in an SSD intensive computation (as data
> cache) which produced wrong results after a few days (but not in the
> first days). The error could be reproduced with f3: The corruptions were
> massive and different files were affected in subsequent f3read runs (==>
> read errors). Unfortunately I did not stored the f3 logs. (I still have
> the corrupt computation results, so it was real.)
> * 2nd occurrence: A single defect sector (read error) in a multi-day
> attempt to reproduce the error with the same kernel (Debian 6.1.112),
> see message 113 in the Debian bug report
>
> Consideration / Notes:
> * These serial links (PCIe) need to be calibrated. Calibration issues
> would explain while the errors (dis)appear under certain condition. But
> errors like this should be detected (nothing could be found in the
> kernel logs). Is the error correction possibly inactive? However, this
> still does not explain why f2 reports overwritten sectors, unless the
> signal errors occur during command / address transmission.
> * Testing is difficult, because the machine is installed remotely and in
> use. ATM, till about end of Janaury, can run tests for bug 1.
> * On the AsRock X600M-STX mainboard (without chipset), the CPU (Ryzen
> 8700G) runs in SoC (system on chip) mode. Maybe someone did not tested
> this properly ...
>
[...]
> With the help of TJ from the Debian kernel team ( https://
> bugs.debian.org/cgi-bin/bugreport.cgi?bug=1076372 ), at least a
> workaround could be found.
>
> The bug is triggered by the patch "nvme-pci: clamp max_hw_sectors
> based on DMA optimized limitation" (see https://lore.kernel.org/linux-
> iommu/20230503161759.GA1614@xxxxxx/ ) introduced in 6.3.7
>
> To examine the situation, I added this debug info (all files are
> located in `drivers/nvme/host`):
>
>> --- core.c.orig 2025-01-03 14:27:38.220428482 +0100
>> +++ core.c 2025-01-03 12:56:34.503259774 +0100
>> @@ -3306,6 +3306,7 @@
>> max_hw_sectors = nvme_mps_to_sectors(ctrl, id->mdts);
>> else
>> max_hw_sectors = UINT_MAX;
>> + dev_warn(ctrl->device, "id->mdts=%d, max_hw_sectors=%d,
>> ctrl->max_hw_sectors=%d\n", id->mdts, max_hw_sectors, ctrl->max_hw_sectors);
>> ctrl->max_hw_sectors =
>> min_not_zero(ctrl->max_hw_sectors, max_hw_sectors);
>
> 6.3.6 (last version w/o mentioned patch and w/o data corruption) says:
>
>> [ 127.196212] nvme nvme0: id->mdts=7, max_hw_sectors=1024,
>> ctrl->max_hw_sectors=16384
>> [ 127.203530] nvme nvme0: allocated 40 MiB host memory buffer.
>
> 6.3.7 (first version w/ mentioned patch and w/ data corruption) says:
>
>> [ 46.436384] nvme nvme0: id->mdts=7, max_hw_sectors=1024,
>> ctrl->max_hw_sectors=256
>> [ 46.443562] nvme nvme0: allocated 40 MiB host memory buffer.
>
> After I reverted the mentioned patch (
>
>> --- pci.c.orig 2025-01-03 14:28:05.944819822 +0100
>> +++ pci.c 2025-01-03 12:54:37.014579093 +0100
>> @@ -3042,7 +3042,8 @@
>> * over a single page.
>> */
>> dev->ctrl.max_hw_sectors = min_t(u32,
>> - NVME_MAX_KB_SZ << 1, dma_opt_mapping_size(&pdev->dev) >> 9);
>> +// NVME_MAX_KB_SZ << 1, dma_opt_mapping_size(&pdev->dev) >> 9);
>> + NVME_MAX_KB_SZ << 1, dma_max_mapping_size(&pdev->dev) >> 9);
>> dev->ctrl.max_segments = NVME_MAX_SEGS;
>>
>> /*
>
> ), 6.11.5 (used this version because sources were laying around) works and says:
>
>> [ 1.251370] nvme nvme0: id->mdts=7, max_hw_sectors=1024,
>> ctrl->max_hw_sectors=16384
>> [ 1.261168] nvme nvme0: allocated 40 MiB host memory buffer.
>
> Thus, the corruption occurs if `ctrl->max_hw_sectors` is set to another (a smaller) value than defined by `id->mdts`.
>
> If this should be allowed, the mentioned patch is not the (root) cause, but reversion is at least a workaround.
See the ticket for more details. Note, you have to use bugzilla to reach
the reporter, as I sadly[1] can not CCed them in mails like this.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.
[1] because bugzilla.kernel.org tells users upon registration their
"email address will never be displayed to logged out users"