NVME timeout causing system hangs

From: Ashton Holmes
Date: Mon Aug 19 2019 - 19:43:59 EST


When playing certain games on my PC dmesg will start spitting out NVME timeout messages, this eventually results in BTRFS throwing errors and remounting itself as read only. The drive passes smart's health check and works fine when not playing games. The really weird part is this will happen even if the game I'm playing isn't installed on that drive. I wanted to bisect this but it happens on every kernel version I've tried. I've attached my dmesg log. This was originally reported here https://bugzilla.kernel.org/show_bug.cgi?id=202633 but no response was ever given. In that report I state that 4.19.24 for whatever reason doesn't trigger this however that doesn't seem to be the case anymore. I've updated my UEFI since then, I wouldn't expect that to make a difference but I'm not sure what else would have changed that.

[ 170.678837] nvme nvme0: I/O 128 QID 2 timeout, aborting
[ 170.678845] nvme nvme0: I/O 129 QID 2 timeout, aborting
[ 170.678850] nvme nvme0: I/O 167 QID 2 timeout, aborting
[ 170.678853] nvme nvme0: I/O 168 QID 2 timeout, aborting
[ 170.678856] nvme nvme0: I/O 169 QID 2 timeout, aborting
[ 201.657527] nvme nvme0: I/O 128 QID 2 timeout, reset controller
[ 232.372876] nvme nvme0: I/O 8 QID 0 timeout, reset controller
[ 323.643688] nvme nvme0: Device not ready; aborting reset
[ 323.675893] print_req_error: I/O error, dev nvme0n1, sector 1088653384 flags 80700
[ 323.675902] print_req_error: I/O error, dev nvme0n1, sector 1001346664 flags 80700
[ 323.675915] print_req_error: I/O error, dev nvme0n1, sector 1088646984 flags 84700
[ 323.675920] print_req_error: I/O error, dev nvme0n1, sector 1088647240 flags 84700
[ 323.675923] print_req_error: I/O error, dev nvme0n1, sector 1088647496 flags 84700
[ 323.675927] print_req_error: I/O error, dev nvme0n1, sector 1088647752 flags 84700
[ 323.675931] print_req_error: I/O error, dev nvme0n1, sector 1088648008 flags 84700
[ 323.675935] print_req_error: I/O error, dev nvme0n1, sector 1088648264 flags 84700
[ 323.675938] print_req_error: I/O error, dev nvme0n1, sector 1088648520 flags 84700
[ 323.675942] print_req_error: I/O error, dev nvme0n1, sector 1088648776 flags 84700
[ 323.675993] nvme nvme0: Abort status: 0x7
[ 323.675995] nvme nvme0: Abort status: 0x7
[ 323.675996] nvme nvme0: Abort status: 0x7
[ 323.675998] nvme nvme0: Abort status: 0x7
[ 323.675999] nvme nvme0: Abort status: 0x7
[ 363.442595] INFO: task kworker/u64:4:279 blocked for more than 120 seconds.
[ 363.442599] Not tainted 5.2.7 #7
[ 363.442601] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 363.442602] kworker/u64:4 D 0 279 2 0x80004000
[ 363.442637] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
[ 363.442639] Call Trace:
[ 363.442646] ? __schedule+0x299/0x660
[ 363.442649] schedule+0x29/0x90
[ 363.442674] wait_current_trans+0xb2/0xe0 [btrfs]
[ 363.442678] ? finish_wait+0x80/0x80
[ 363.442703] start_transaction+0x2cd/0x470 [btrfs]
[ 363.442729] btrfs_finish_ordered_io+0x233/0x830 [btrfs]
[ 363.442732] ? __switch_to_asm+0x40/0x70
[ 363.442759] normal_work_helper+0xae/0x2f0 [btrfs]
[ 363.442762] process_one_work+0x19b/0x3a0
[ 363.442764] worker_thread+0x30/0x370
[ 363.442766] ? process_one_work+0x3a0/0x3a0
[ 363.442768] kthread+0x112/0x130
[ 363.442771] ? kthread_create_on_node+0x60/0x60
[ 363.442772] ret_from_fork+0x22/0x40
[ 363.442776] INFO: task kworker/u64:6:283 blocked for more than 120 seconds.
[ 363.442777] Not tainted 5.2.7 #7
[ 363.442778] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 363.442779] kworker/u64:6 D 0 283 2 0x80004000
[ 363.442808] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
[ 363.442809] Call Trace:
[ 363.442811] ? __schedule+0x299/0x660
[ 363.442814] schedule+0x29/0x90
[ 363.442837] wait_current_trans+0xb2/0xe0 [btrfs]
[ 363.442840] ? finish_wait+0x80/0x80
[ 363.442864] start_transaction+0x2cd/0x470 [btrfs]
[ 363.442889] btrfs_finish_ordered_io+0x233/0x830 [btrfs]
[ 363.442916] normal_work_helper+0xae/0x2f0 [btrfs]
[ 363.442918] process_one_work+0x19b/0x3a0
[ 363.442920] worker_thread+0x30/0x370
[ 363.442922] ? process_one_work+0x3a0/0x3a0
[ 363.442923] kthread+0x112/0x130
[ 363.442925] ? kthread_create_on_node+0x60/0x60
[ 363.442927] ret_from_fork+0x22/0x40
[ 363.442930] INFO: task kworker/u64:10:288 blocked for more than 120 seconds.
[ 363.442931] Not tainted 5.2.7 #7
[ 363.442932] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 363.442933] kworker/u64:10 D 0 288 2 0x80004000
[ 363.442960] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
[ 363.442961] Call Trace:
[ 363.442964] ? __schedule+0x299/0x660
[ 363.442966] schedule+0x29/0x90
[ 363.442990] wait_current_trans+0xb2/0xe0 [btrfs]
[ 363.442992] ? finish_wait+0x80/0x80
[ 363.443016] start_transaction+0x2cd/0x470 [btrfs]
[ 363.443041] btrfs_finish_ordered_io+0x233/0x830 [btrfs]
[ 363.443043] ? __switch_to_asm+0x40/0x70
[ 363.443070] normal_work_helper+0xae/0x2f0 [btrfs]
[ 363.443072] process_one_work+0x19b/0x3a0
[ 363.443074] worker_thread+0x30/0x370
[ 363.443075] ? process_one_work+0x3a0/0x3a0
[ 363.443077] kthread+0x112/0x130
[ 363.443079] ? kthread_create_on_node+0x60/0x60
[ 363.443081] ret_from_fork+0x22/0x40
[ 363.443083] INFO: task kworker/u64:11:289 blocked for more than 120 seconds.
[ 363.443084] Not tainted 5.2.7 #7
[ 363.443085] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 363.443086] kworker/u64:11 D 0 289 2 0x80004000
[ 363.443114] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
[ 363.443114] Call Trace:
[ 363.443117] ? __schedule+0x299/0x660
[ 363.443119] schedule+0x29/0x90
[ 363.443143] wait_current_trans+0xb2/0xe0 [btrfs]
[ 363.443146] ? finish_wait+0x80/0x80
[ 363.443169] start_transaction+0x2cd/0x470 [btrfs]
[ 363.443192] btrfs_finish_ordered_io+0x233/0x830 [btrfs]
[ 363.443194] ? __switch_to_asm+0x40/0x70
[ 363.443217] normal_work_helper+0xae/0x2f0 [btrfs]
[ 363.443218] process_one_work+0x19b/0x3a0
[ 363.443220] worker_thread+0x30/0x370
[ 363.443221] ? process_one_work+0x3a0/0x3a0
[ 363.443223] kthread+0x112/0x130
[ 363.443224] ? kthread_create_on_node+0x60/0x60
[ 363.443226] ret_from_fork+0x22/0x40
[ 363.443241] INFO: task btrfs-transacti:897 blocked for more than 120 seconds.
[ 363.443242] Not tainted 5.2.7 #7
[ 363.443243] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 363.443244] btrfs-transacti D 0 897 2 0x80004000
[ 363.443245] Call Trace:
[ 363.443247] ? __schedule+0x299/0x660
[ 363.443249] schedule+0x29/0x90
[ 363.443251] io_schedule+0x12/0x40
[ 363.443254] wait_on_page_bit+0x12e/0x1f0
[ 363.443256] ? file_fdatawait_range+0x20/0x20
[ 363.443279] read_extent_buffer_pages+0x27a/0x320 [btrfs]
[ 363.443300] btree_read_extent_buffer_pages+0x9b/0x110 [btrfs]
[ 363.443320] read_tree_block+0x47/0x60 [btrfs]
[ 363.443337] read_block_for_search.isra.36+0x1b6/0x350 [btrfs]
[ 363.443356] btrfs_search_slot+0x263/0x990 [btrfs]
[ 363.443374] lookup_inline_extent_backref+0x181/0x630 [btrfs]
[ 363.443392] ? leaf_space_used+0xdc/0x100 [btrfs]
[ 363.443410] __btrfs_free_extent.isra.79+0xf3/0x900 [btrfs]
[ 363.443428] ? pin_down_extent+0xd1/0x140 [btrfs]
[ 363.443446] ? block_rsv_release_bytes+0xfb/0x130 [btrfs]
[ 363.443468] ? btrfs_merge_delayed_refs+0x2be/0x340 [btrfs]
[ 363.443487] __btrfs_run_delayed_refs+0x6d8/0xf40 [btrfs]
[ 363.443506] ? btrfs_create_pending_block_groups+0x145/0x230 [btrfs]
[ 363.443508] ? _cond_resched+0x15/0x30
[ 363.443527] btrfs_run_delayed_refs+0xd3/0x180 [btrfs]
[ 363.443548] btrfs_commit_transaction+0x445/0x9b0 [btrfs]
[ 363.443568] ? start_transaction+0x9e/0x470 [btrfs]
[ 363.443588] transaction_kthread+0x152/0x170 [btrfs]
[ 363.443609] ? btrfs_cleanup_transaction+0x560/0x560 [btrfs]
[ 363.443610] kthread+0x112/0x130
[ 363.443612] ? kthread_create_on_node+0x60/0x60
[ 363.443613] ret_from_fork+0x22/0x40
[ 363.443645] INFO: task kworker/u64:14:1347 blocked for more than 120 seconds.
[ 363.443646] Not tainted 5.2.7 #7
[ 363.443647] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 363.443648] kworker/u64:14 D 0 1347 2 0x80004000
[ 363.443671] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
[ 363.443672] Call Trace:
[ 363.443674] ? __schedule+0x299/0x660
[ 363.443676] schedule+0x29/0x90
[ 363.443696] wait_current_trans+0xb2/0xe0 [btrfs]
[ 363.443698] ? finish_wait+0x80/0x80
[ 363.443719] start_transaction+0x2cd/0x470 [btrfs]
[ 363.443740] btrfs_finish_ordered_io+0x233/0x830 [btrfs]
[ 363.443742] ? __switch_to_asm+0x40/0x70
[ 363.443764] normal_work_helper+0xae/0x2f0 [btrfs]
[ 363.443766] process_one_work+0x19b/0x3a0
[ 363.443768] worker_thread+0x30/0x370
[ 363.443769] ? process_one_work+0x3a0/0x3a0
[ 363.443770] kthread+0x112/0x130
[ 363.443772] ? kthread_create_on_node+0x60/0x60
[ 363.443773] ret_from_fork+0x22/0x40
[ 363.443776] INFO: task kworker/u64:16:1349 blocked for more than 120 seconds.
[ 363.443777] Not tainted 5.2.7 #7
[ 363.443778] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 363.443778] kworker/u64:16 D 0 1349 2 0x80004000
[ 363.443802] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
[ 363.443802] Call Trace:
[ 363.443805] ? __schedule+0x299/0x660
[ 363.443807] schedule+0x29/0x90
[ 363.443827] wait_current_trans+0xb2/0xe0 [btrfs]
[ 363.443829] ? finish_wait+0x80/0x80
[ 363.443849] start_transaction+0x2cd/0x470 [btrfs]
[ 363.443870] btrfs_finish_ordered_io+0x233/0x830 [btrfs]
[ 363.443896] normal_work_helper+0xae/0x2f0 [btrfs]
[ 363.443898] process_one_work+0x19b/0x3a0
[ 363.443900] worker_thread+0x30/0x370
[ 363.443901] ? process_one_work+0x3a0/0x3a0
[ 363.443903] kthread+0x112/0x130
[ 363.443905] ? kthread_create_on_node+0x60/0x60
[ 363.443907] ret_from_fork+0x22/0x40
[ 363.443912] INFO: task CIPCServer::Thr:1617 blocked for more than 120 seconds.
[ 363.443914] Not tainted 5.2.7 #7
[ 363.443915] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 363.443916] CIPCServer::Thr D 0 1617 1352 0x20020004
[ 363.443918] Call Trace:
[ 363.443920] ? __schedule+0x299/0x660
[ 363.443924] ? blk_flush_plug_list+0xd7/0x100
[ 363.443926] schedule+0x29/0x90
[ 363.443928] io_schedule+0x12/0x40
[ 363.443930] wait_on_page_bit+0x12e/0x1f0
[ 363.443933] ? file_fdatawait_range+0x20/0x20
[ 363.443962] read_extent_buffer_pages+0x27a/0x320 [btrfs]
[ 363.443983] btree_read_extent_buffer_pages+0x9b/0x110 [btrfs]
[ 363.444008] read_tree_block+0x47/0x60 [btrfs]
[ 363.444030] read_block_for_search.isra.36+0x1b6/0x350 [btrfs]
[ 363.444054] btrfs_search_slot+0x263/0x990 [btrfs]
[ 363.444080] btrfs_lookup_csum+0x52/0x150 [btrfs]
[ 363.444084] ? kmem_cache_alloc+0x150/0x1c0
[ 363.444103] __btrfs_lookup_bio_sums+0x228/0x660 [btrfs]
[ 363.444106] ? _cond_resched+0x15/0x30
[ 363.444126] btrfs_submit_bio_hook+0x103/0x170 [btrfs]
[ 363.444149] submit_one_bio+0x31/0x50 [btrfs]
[ 363.444173] extent_readpages+0x2aa/0x370 [btrfs]
[ 363.444178] ? tomoyo_check_open_permission+0xa6/0x190
[ 363.444180] read_pages+0x6b/0x190
[ 363.444183] __do_page_cache_readahead+0x1c1/0x1e0
[ 363.444186] ondemand_readahead+0x182/0x2f0
[ 363.444189] generic_file_read_iter+0x818/0xc80
[ 363.444193] new_sync_read+0x12a/0x1c0
[ 363.444195] vfs_read+0x91/0x140
[ 363.444197] ksys_read+0x59/0xd0
[ 363.444199] do_fast_syscall_32+0x94/0x204
[ 363.444202] entry_SYSCALL_compat_after_hwframe+0x45/0x4d
[ 363.444205] INFO: task CJobMgr::m_Work:1632 blocked for more than 120 seconds.
[ 363.444207] Not tainted 5.2.7 #7
[ 363.444207] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 363.444209] CJobMgr::m_Work D 0 1632 1352 0x20020000
[ 363.444210] Call Trace:
[ 363.444212] ? __schedule+0x299/0x660
[ 363.444214] schedule+0x29/0x90
[ 363.444216] io_schedule+0x12/0x40
[ 363.444217] wait_on_page_bit+0x12e/0x1f0
[ 363.444219] ? file_fdatawait_range+0x20/0x20
[ 363.444243] read_extent_buffer_pages+0x27a/0x320 [btrfs]
[ 363.444263] btree_read_extent_buffer_pages+0x9b/0x110 [btrfs]
[ 363.444283] read_tree_block+0x47/0x60 [btrfs]
[ 363.444300] read_block_for_search.isra.36+0x1b6/0x350 [btrfs]
[ 363.444318] btrfs_search_slot+0x263/0x990 [btrfs]
[ 363.444339] ? do_async_commit+0x30/0x30 [btrfs]
[ 363.444358] btrfs_lookup_inode+0x3a/0xc0 [btrfs]
[ 363.444361] ? kmem_cache_alloc+0x150/0x1c0
[ 363.444382] btrfs_iget_path+0x537/0x6e0 [btrfs]
[ 363.444404] btrfs_lookup_dentry+0x469/0x570 [btrfs]
[ 363.444426] btrfs_lookup+0xe/0x30 [btrfs]
[ 363.444428] __lookup_slow+0x97/0x150
[ 363.444430] lookup_slow+0x35/0x50
[ 363.444432] walk_component+0x1bf/0x330
[ 363.444435] path_lookupat.isra.46+0x6d/0x220
[ 363.444436] ? mutex_lock+0xe/0x30
[ 363.444438] filename_lookup.part.62+0xa0/0x170
[ 363.444441] ? strncpy_from_user+0x4f/0x1a0
[ 363.444443] ? getname_flags+0x6a/0x1e0
[ 363.444444] do_faccessat+0xac/0x250
[ 363.444446] do_fast_syscall_32+0x94/0x204
[ 363.444448] entry_SYSCALL_compat_after_hwframe+0x45/0x4d
[ 363.444461] INFO: task steamerrorrepor:2016 blocked for more than 120 seconds.
[ 363.444462] Not tainted 5.2.7 #7
[ 363.444462] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 363.444463] steamerrorrepor D 0 2016 1 0x20020000
[ 363.444464] Call Trace:
[ 363.444467] ? __schedule+0x299/0x660
[ 363.444468] ? blk_flush_plug_list+0xd7/0x100
[ 363.444470] schedule+0x29/0x90
[ 363.444472] io_schedule+0x12/0x40
[ 363.444473] wait_on_page_bit+0x12e/0x1f0
[ 363.444475] ? file_fdatawait_range+0x20/0x20
[ 363.444498] read_extent_buffer_pages+0x27a/0x320 [btrfs]
[ 363.444518] btree_read_extent_buffer_pages+0x9b/0x110 [btrfs]
[ 363.444539] read_tree_block+0x47/0x60 [btrfs]
[ 363.444556] read_block_for_search.isra.36+0x1b6/0x350 [btrfs]
[ 363.444574] btrfs_search_slot+0x263/0x990 [btrfs]
[ 363.444593] btrfs_lookup_csum+0x52/0x150 [btrfs]
[ 363.444596] ? kmem_cache_alloc+0x150/0x1c0
[ 363.444615] __btrfs_lookup_bio_sums+0x228/0x660 [btrfs]
[ 363.444617] ? _cond_resched+0x15/0x30
[ 363.444638] btrfs_submit_bio_hook+0x103/0x170 [btrfs]
[ 363.444660] submit_one_bio+0x31/0x50 [btrfs]
[ 363.444683] extent_readpages+0x2aa/0x370 [btrfs]
[ 363.444685] ? tomoyo_check_open_permission+0xa6/0x190
[ 363.444687] read_pages+0x6b/0x190
[ 363.444690] __do_page_cache_readahead+0x1c1/0x1e0
[ 363.444692] ondemand_readahead+0x182/0x2f0
[ 363.444693] ? pagecache_get_page+0x30/0x2d0
[ 363.444695] generic_file_read_iter+0x818/0xc80
[ 363.444697] ? do_filp_open+0xa7/0x100
[ 363.444699] new_sync_read+0x12a/0x1c0
[ 363.444701] vfs_read+0x91/0x140
[ 363.444702] ksys_read+0x59/0xd0
[ 363.444704] do_int80_syscall_32+0x50/0xf0
[ 363.444706] entry_INT80_compat+0x82/0x90
[ 384.214291] nvme nvme0: Device not ready; aborting reset
[ 384.214297] nvme nvme0: Removing after probe failure status: -19
[ 444.833351] nvme nvme0: Device not ready; aborting reset
[ 444.857550] print_req_error: 17 callbacks suppressed
[ 444.857553] print_req_error: I/O error, dev nvme0n1, sector 2126920 flags 1000
[ 444.857556] print_req_error: I/O error, dev nvme0n1, sector 1583912 flags 1000
[ 444.857559] print_req_error: I/O error, dev nvme0n1, sector 481082728 flags 1000
[ 444.857563] BTRFS error (device dm-1): bdev /dev/mapper/Storage errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
[ 444.857567] BTRFS error (device dm-1): bdev /dev/mapper/Storage errs: wr 1, rd 1, flush 0, corrupt 0, gen 0
[ 444.857569] BTRFS error (device dm-1): bdev /dev/mapper/Storage errs: wr 1, rd 2, flush 0, corrupt 0, gen 0
[ 444.857572] BTRFS error (device dm-1): bdev /dev/mapper/Storage errs: wr 1, rd 3, flush 0, corrupt 0, gen 0
[ 444.857577] BTRFS error (device dm-1): bdev /dev/mapper/Storage errs: wr 2, rd 3, flush 0, corrupt 0, gen 0
[ 444.857592] BTRFS error (device dm-1): bdev /dev/mapper/Storage errs: wr 2, rd 4, flush 0, corrupt 0, gen 0
[ 444.857609] BTRFS info (device dm-1): no csum found for inode 27605208 start 5177344
[ 444.857618] BTRFS info (device dm-1): no csum found for inode 29805279 start 0
[ 444.857632] BTRFS: error (device dm-1) in __btrfs_free_extent:6983: errno=-5 IO failure
[ 444.857635] BTRFS info (device dm-1): forced readonly
[ 444.857638] BTRFS: error (device dm-1) in btrfs_run_delayed_refs:2907: errno=-5 IO failure
[ 444.857642] BTRFS warning (device dm-1): Skipping commit of aborted transaction.
[ 444.857643] BTRFS: error (device dm-1) in cleanup_transaction:1794: errno=-5 IO failure
[ 444.857650] BTRFS error (device dm-1): bdev /dev/mapper/Storage errs: wr 2, rd 5, flush 0, corrupt 0, gen 0
[ 444.857689] BTRFS error (device dm-1): bdev /dev/mapper/Storage errs: wr 2, rd 6, flush 0, corrupt 0, gen 0
[ 444.857694] BTRFS info (device dm-1): no csum found for inode 29805279 start 4096
[ 444.857707] BTRFS error (device dm-1): bdev /dev/mapper/Storage errs: wr 2, rd 7, flush 0, corrupt 0, gen 0
[ 444.857726] BTRFS info (device dm-1): no csum found for inode 29805279 start 8192
[ 444.857737] BTRFS error (device dm-1): bdev /dev/mapper/Storage errs: wr 2, rd 8, flush 0, corrupt 0, gen 0
[ 444.857753] BTRFS info (device dm-1): no csum found for inode 29805279 start 12288
[ 444.857813] BTRFS info (device dm-1): no csum found for inode 29805279 start 0
[ 444.857819] BTRFS info (device dm-1): no csum found for inode 27605208 start 5177344
[ 444.857860] BTRFS info (device dm-1): no csum found for inode 29805279 start 0
[ 444.858292] BTRFS info (device dm-1): no csum found for inode 29805279 start 0
[ 444.858328] BTRFS info (device dm-1): no csum found for inode 29805279 start 0