Re: [PATCH v2] PCI: Add quirk to disable ASPM L1 for Sandisk SN740 NVMe SSDs

From: Val Packett

Date: Thu Dec 04 2025 - 16:28:19 EST


On 12/4/25 9:51 AM, Konrad Dybcio wrote:

On 12/1/25 7:48 AM, Val Packett wrote:
On 11/25/25 2:21 AM, Manivannan Sadhasivam wrote:
[..]
There are a couple of points that made me convince myself:

* Other X1E laptops are working fine with ASPM L1.
* This laptop has WCN785x WiFi/BT combo card connected to the other controller
instance and L1 is working fine for it.
* There is no known issue with ASPM L1 in X1E chipsets.

Because of these, I was so certain that the NVMe is the fault here.
There is *a* known issue with ASPM L1 on X1E, reported by maaaany users on #aarch64-laptops, that we discussed in another thread..

But it is a full system freeze, **not** a correctable AER message, and it definitely happens with a bunch of various SSDs on various laptops. I personally have had it happen both with the SN740 and an SK Hynix drive, on a Latitude 7455. It's an SSD-only issue (disabling ASPM just for the drive, but keeping it on for the WiFi, was enough to get to month-long uptime) but not specific to any SSD model.
Are the steps to reproduce roughly

* boot without disabling ASPM
* wait
* system reboots on its own (or just freezes?)

?

Yeah.

Wait can be anywhere from minutes to days, it seems completely random and "luck based".

In EL1, the system freezes for a minute and gets rebooted by the watchdog.

In EL2 as I have just now discovered, some cores can still be running (presumably those that haven't tried accessing the drive) as others hang, and we can get a proper panic, I got this logged to efi_pstore:

<0>[ 1500.017790] watchdog: CPU3: Watchdog detected hard LOCKUP on cpu 4
<4>[ 1500.017801] Modules linked in: [..]
<6>[ 1500.017937] Sending NMI from CPU 3 to CPUs 4:
<0>[ 1510.017956] Kernel panic - not syncing: Hard LOCKUP
<4>[ 1510.017970] Call trace: [one with watchdog_hardlockup_check, from CPU3]
<2>[ 1510.018062] SMP: stopping secondary CPUs
<4>[ 1511.085450] SMP: failed to stop secondary CPUs 4-11

No traces from the frozen cores are logged as they don't respond to NMI. They are *completely* wedged.

~val