Re: [PATCH v2] nvme: explicitly disable APST on quirked devices

From: Kai-Heng Feng
Date: Tue Jun 27 2017 - 00:25:55 EST


On Tue, Jun 27, 2017 at 2:05 AM, Andy Lutomirski <luto@xxxxxxxxxx> wrote:
> On Mon, Jun 26, 2017 at 12:01 AM, Kai-Heng Feng
> <kai.heng.feng@xxxxxxxxxxxxx> wrote:
>> A user reports APST is enabled, even when the NVMe is quirked or with
>> option "default_ps_max_latency_us=0".
>>
>> The current logic will not set APST if the device is quirked. But the
>> NVMe in question will enable APST automatically.
>>
>> Separate the logic "apst is supported" and "to enable apst", so we can
>> use the latter one to explicitly disable APST at initialiaztion.
>
> Reviewed-by: Andy Lutomirski <luto@xxxxxxxxxx>
>
> That being said, I smell a giant WTF here. The affected hardware
> seems to have APST on by default, and APST is buggy so the disk stops
> working when APST is on. So here's the $1M question: how does the
> system *boot*? After all, it's running for a while before the kernel
> gets around to turning off APST, and I really doubt that BIOS does
> this.

>From my experience, systems never failed to boot on those faulty
NVMes. Probably because the constantly disk read required by boot
never let the NVMe transited to PS4. The problem always occurs after
some usage after boot.

Seems like the user has a tricky system. At first, APST wasn't
enabled. It's enabled after boot with a new kernel, and it's enabled
forever. Even if it's disabled explicitly, the APST is still enabled
by default on the system. The user didn't upgrade BIOS in the interim.

>
> Here's a wild theory: what if the problem on all these disks is
> actually our CSTS polling? Could it be that some of the disks
> implement CSTS reads in firmware and malfunction if CSTS is read while
> in PS4? This would be a blatant spec violation, but that's never
> stopped anyone before...
>
> --Andy