Re: NVME timeout causing system hangs

From: Keith Busch
Date: Thu Aug 22 2019 - 13:31:00 EST


On Mon, Aug 19, 2019 at 04:33:45PM -0700, Ashton Holmes wrote:
> When playing certain games on my PC dmesg will start spitting out NVME
> timeout messages, this eventually results in BTRFS throwing errors and
> remounting itself as read only. The drive passes smart's health check and
> works fine when not playing games. The really weird part is this will happen
> even if the game I'm playing isn't installed on that drive. I wanted to
> bisect this but it happens on every kernel version I've tried. I've attached
> my dmesg log. This was originally reported here
> https://bugzilla.kernel.org/show_bug.cgi?id=202633 but no response was ever
> given. In that report I state that 4.19.24 for whatever reason doesn't
> trigger this however that doesn't seem to be the case anymore. I've updated
> my UEFI since then, I wouldn't expect that to make a difference but I'm not
> sure what else would have changed that.

This really looks like your nvme controller has gotten itself in an
unresponsive state: it is not responding to IO, admin, or reset
requests.

The only recommendation I have at the moment is to verify you have the
most current firmware from your vendor installed on this controller,
and update if not.



> [ 170.678837] nvme nvme0: I/O 128 QID 2 timeout, aborting
> [ 170.678845] nvme nvme0: I/O 129 QID 2 timeout, aborting
> [ 170.678850] nvme nvme0: I/O 167 QID 2 timeout, aborting
> [ 170.678853] nvme nvme0: I/O 168 QID 2 timeout, aborting
> [ 170.678856] nvme nvme0: I/O 169 QID 2 timeout, aborting
> [ 201.657527] nvme nvme0: I/O 128 QID 2 timeout, reset controller
> [ 232.372876] nvme nvme0: I/O 8 QID 0 timeout, reset controller
> [ 323.643688] nvme nvme0: Device not ready; aborting reset
> [ 323.675893] print_req_error: I/O error, dev nvme0n1, sector 1088653384 flags 80700
> [ 323.675902] print_req_error: I/O error, dev nvme0n1, sector 1001346664 flags 80700
> [ 323.675915] print_req_error: I/O error, dev nvme0n1, sector 1088646984 flags 84700
> [ 323.675920] print_req_error: I/O error, dev nvme0n1, sector 1088647240 flags 84700
> [ 323.675923] print_req_error: I/O error, dev nvme0n1, sector 1088647496 flags 84700
> [ 323.675927] print_req_error: I/O error, dev nvme0n1, sector 1088647752 flags 84700
> [ 323.675931] print_req_error: I/O error, dev nvme0n1, sector 1088648008 flags 84700
> [ 323.675935] print_req_error: I/O error, dev nvme0n1, sector 1088648264 flags 84700
> [ 323.675938] print_req_error: I/O error, dev nvme0n1, sector 1088648520 flags 84700
> [ 323.675942] print_req_error: I/O error, dev nvme0n1, sector 1088648776 flags 84700
> [ 323.675993] nvme nvme0: Abort status: 0x7
> [ 323.675995] nvme nvme0: Abort status: 0x7
> [ 323.675996] nvme nvme0: Abort status: 0x7
> [ 323.675998] nvme nvme0: Abort status: 0x7
> [ 323.675999] nvme nvme0: Abort status: 0x7