Re: NVME timeout causing system hangs

From: Ashton Holmes
Date: Mon Aug 26 2019 - 23:16:39 EST


I tried looking around for the firmware online but couldn't find it, contacting adata ended up dead ending me, they said I can't find it because there hasn't ever been an firmware updates for this drive and am recommending I RMA it. Is there anything I can do to further troubleshoot this and make sure it is/isn't a hardware issue before I RMA it?

On 8/22/19 10:28 AM, Keith Busch wrote:
On Mon, Aug 19, 2019 at 04:33:45PM -0700, Ashton Holmes wrote:
When playing certain games on my PC dmesg will start spitting out NVME
timeout messages, this eventually results in BTRFS throwing errors and
remounting itself as read only. The drive passes smart's health check and
works fine when not playing games. The really weird part is this will happen
even if the game I'm playing isn't installed on that drive. I wanted to
bisect this but it happens on every kernel version I've tried. I've attached
my dmesg log. This was originally reported here
https://bugzilla.kernel.org/show_bug.cgi?id=202633 but no response was ever
given. In that report I state that 4.19.24 for whatever reason doesn't
trigger this however that doesn't seem to be the case anymore. I've updated
my UEFI since then, I wouldn't expect that to make a difference but I'm not
sure what else would have changed that.
This really looks like your nvme controller has gotten itself in an
unresponsive state: it is not responding to IO, admin, or reset
requests.

The only recommendation I have at the moment is to verify you have the
most current firmware from your vendor installed on this controller,
and update if not.


[ 170.678837] nvme nvme0: I/O 128 QID 2 timeout, aborting
[ 170.678845] nvme nvme0: I/O 129 QID 2 timeout, aborting
[ 170.678850] nvme nvme0: I/O 167 QID 2 timeout, aborting
[ 170.678853] nvme nvme0: I/O 168 QID 2 timeout, aborting
[ 170.678856] nvme nvme0: I/O 169 QID 2 timeout, aborting
[ 201.657527] nvme nvme0: I/O 128 QID 2 timeout, reset controller
[ 232.372876] nvme nvme0: I/O 8 QID 0 timeout, reset controller
[ 323.643688] nvme nvme0: Device not ready; aborting reset
[ 323.675893] print_req_error: I/O error, dev nvme0n1, sector 1088653384 flags 80700
[ 323.675902] print_req_error: I/O error, dev nvme0n1, sector 1001346664 flags 80700
[ 323.675915] print_req_error: I/O error, dev nvme0n1, sector 1088646984 flags 84700
[ 323.675920] print_req_error: I/O error, dev nvme0n1, sector 1088647240 flags 84700
[ 323.675923] print_req_error: I/O error, dev nvme0n1, sector 1088647496 flags 84700
[ 323.675927] print_req_error: I/O error, dev nvme0n1, sector 1088647752 flags 84700
[ 323.675931] print_req_error: I/O error, dev nvme0n1, sector 1088648008 flags 84700
[ 323.675935] print_req_error: I/O error, dev nvme0n1, sector 1088648264 flags 84700
[ 323.675938] print_req_error: I/O error, dev nvme0n1, sector 1088648520 flags 84700
[ 323.675942] print_req_error: I/O error, dev nvme0n1, sector 1088648776 flags 84700
[ 323.675993] nvme nvme0: Abort status: 0x7
[ 323.675995] nvme nvme0: Abort status: 0x7
[ 323.675996] nvme nvme0: Abort status: 0x7
[ 323.675998] nvme nvme0: Abort status: 0x7
[ 323.675999] nvme nvme0: Abort status: 0x7