RE: [PATCH V3 1/1] nvme: Add quirk for LiteON CL1 devices running FW 220TQ,22001
From: Gloria Tsai
Date: Mon Nov 02 2020 - 21:21:23 EST
Rephrased the problem description here,
When host issue shutdown + D3hot in suspend, NVMe drive might have chance choosing wrong pointer which has already been used by GC then cause over program.
Do GC before shutdown -> delete IO Q -> shutdown from host -> breakup GC -> D3hot -> enter PS4 -> have a chance swap block -> use wrong pointer on device SRAM -> over program
The issue only happens in simple suspend (shutdown+D3hot) with specific FW on Kahoku board.
Regards,
Gloria Tsai
_____________________________________
Sales PM Division
Solid State Storage Technology Corporation
TEL: +886-3-612-3888 ext. 2201
E-Mail: gloria.tsai@xxxxxxxxx
_____________________________________
-----Original Message-----
From: Christoph Hellwig <hch@xxxxxx>
Sent: Tuesday, November 3, 2020 2:13 AM
To: Jongpil Jung <jongpuls@xxxxxxxxx>
Cc: Keith Busch <kbusch@xxxxxxxxxx>; Jens Axboe <axboe@xxxxxx>; Christoph Hellwig <hch@xxxxxx>; Sagi Grimberg <sagi@xxxxxxxxxxx>; linux-nvme@xxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Gloria Tsai <Gloria.Tsai@xxxxxxxxx>; jongpil19.jung@xxxxxxxxxxx; jongheony.kim@xxxxxxxxxxx; dj54.sohn@xxxxxxxxxxx
Subject: Re: [PATCH V3 1/1] nvme: Add quirk for LiteON CL1 devices running FW 220TQ,22001
This message was sent from outside of the company. Please do not click links or open attachments unless you recognize the source of this email and know the content is safe.
On Thu, Oct 29, 2020 at 03:55:29PM +0100, Christoph Hellwig wrote:
> I'm still worried about this.
>
> If power state based suspend does always work despite a HMB and is
> preferred for the specific Google board we should have purely a DMI
> based quirk for the board independent of the NVMe controller used with
> it.
>
> But if these LiteON devices can't properly handle nvme_dev_disable
> calls we have much deeper problems, because it can be called in all
> kinds of places, including suspending when not on this specific board.
>
> That being said, I still really do not understand this sentence and
> thus the problem at all:
>
> > When NVMe device receive D3hot from host, NVMe firmware will do
> > garbage collection. While NVMe device do Garbage collection,
> > firmware has chance to going incorrect address.
Any progress in describing the problem a little better?