VM Boot Hangs with Commit "Revert "scsi: core: run queue if SCSI device queue isn't ready and queue is idle"" on linux-5.4.y

From: Sherry Yang
Date: Mon Jul 24 2023 - 01:28:28 EST


Hi Douglas,

We observed linux-stable v5.4 VM boot hangs, but probably only 1 in thousands of boots (less than 10,000 boots). We started 16 VMs on a Bare Metal with loop reboots, I chose 10,000 boots as a threshold, and bisected it. After a painful bisection, I found the culprit commit 578c8f09c04b (“Revert scsi: core: run queue if SCSI device queue isnt ready and queue is idle”). This commit actually was merged to v5.8 the 1st time. It's a series of patch set (https://www.spinics.net/lists/linux-block/msg51866.html). Actually, in the 4-patch series, 2 of them have already been backported to linux-stable v5.4, but not at the same time:

1) ab3cee3762e5 (“blk-mq: In blk_mq_dispatch_rq_list() no budget is a reason to kick”) in tag v5.4.86
2) 578c8f09c04b (“Revert scsi: core: run queue if SCSI device queue isnt ready and queue is idle”) in tag v5.4.235, it’s backported as stable dependency for another commit

Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx>
Reviewed-by: Ming Lei <ming.lei@xxxxxxxxxx>
Acked-by: Martin K. Petersen <martin.petersen@xxxxxxxxxx>
Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
Stable-dep-of: c31e76bcc379 ("blk-mq: remove stale comment for blk_mq_sched_mark_restart_hctx”)
Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

And I tried backporting the other 2 patches to v5.4, the issue is still reproducible.

I tested multiple kernels, the issue is not reproducible within 10,000 boots in the following kernels:
1) Linux v5.9
2) Linux v5.4.249 + revert of 578c8f09c04b (“Revert scsi: core: run queue if SCSI device queue isnt ready and queue is idle”)

Not exactly sure how this commit is affecting linux-stable v5.4, but I suspect some prerequisite commits are missing which lead to boot hangs on linux-stable v5.4 but not on higher releases. Could you take a look at this issue and share your insight?

Here is the call trace:
[ 369.850681] INFO: task systemd-udevd:313 blocked for more than 122 seconds.
[ 369.852180] Not tainted 5.4.248-master.20230608.el8.dev.x86_64 #1
[ 369.853631] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 369.855406] systemd-udevd D 0 313 282 0x80004106
[ 369.856662] Call Trace:
[ 369.857276] __schedule+0x213/0x570
[ 369.858017] schedule+0x42/0xb0
[ 369.858699] io_schedule+0x45/0x70
[ 369.859671] wait_on_page_bit_common+0x131/0x380
[ 369.860884] ? find_get_entries+0x1a9/0x260
[ 369.862011] ? __filemap_set_wb_err+0x70/0x70
[ 369.863239] __lock_page+0x44/0x50
[ 369.864122] truncate_inode_pages_range+0x463/0x8a0
[ 369.865516] ? pagevec_lookup_range_tag+0x28/0x40
[ 369.866891] ? free_cpumask_var+0x9/0x10
[ 369.867909] ? mark_buffer_async_write+0x30/0x30
[ 369.869019] ? get_ksm_page+0xf6/0x210
[ 369.869936] ? free_cpumask_var+0x9/0x10
[ 369.871020] ? on_each_cpu_cond_mask+0xb1/0x130
[ 369.872062] truncate_inode_pages+0x15/0x20
[ 369.872739] __blkdev_put+0xa7/0x220
[ 369.873408] ? exit_mmap+0x121/0x1b0
[ 369.874092] blkdev_put+0x4e/0xe0
[ 369.874674] blkdev_close+0x26/0x30
[ 369.875281] __fput+0xcc/0x260
[ 369.875787] ____fput+0xe/0x10
[ 369.876298] task_work_run+0x8b/0xb0
[ 369.876892] do_exit+0x1ff/0x420
[ 369.877414] do_group_exit+0x3b/0xb0
[ 369.877989] get_signal+0x169/0x8b0
[ 369.878549] do_signal+0x2a/0x100
[ 369.879130] ? __vfs_read+0x29/0x40
[ 369.879713] ? vfs_read+0xaa/0x160
[ 369.880281] ? ksys_read+0x67/0xe0
[ 369.880858] prepare_exit_to_usermode+0x12b/0x1a0
[ 369.881614] do_syscall_64+0x8e/0x100
[ 369.882199] entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[ 369.883056] RIP: 0033:0x7f6d466de8c2
[ 369.883664] Code: Bad RIP value.
[ 369.884217] RSP: 002b:00007fffffadb558 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 369.885433] RAX: fffffffffffffffc RBX: 000055b0b3c40ab0 RCX: 00007f6d466de8c2
[ 369.886552] RDX: 0000000000000200 RSI: 000055b0b3c40ad8 RDI: 000000000000000f
[ 369.887692] RBP: 000055b0b3c3dd20 R08: 000055b0b3c40ab0 R09: 0000000000000001
[ 369.888842] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000004000
[ 369.889963] R13: 0000000000000200 R14: 000055b0b3c3dd70 R15: 000055b0b3c40ac8
[ 369.891529] INFO: task systemd-udevd:315 blocked for more than 122 seconds.
[ 369.893182] Not tainted 5.4.248-master.20230608.el8.dev.x86_64 #1
[ 369.894557] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 369.896318] systemd-udevd D 0 315 282 0x80004106
[ 369.897596] Call Trace:
[ 369.898197] __schedule+0x213/0x570
[ 369.899068] schedule+0x42/0xb0
[ 369.899813] io_schedule+0x45/0x70
[ 369.900640] wait_on_page_bit_common+0x131/0x380
[ 369.901711] ? find_get_entries+0x1a9/0x260
[ 369.902688] ? __filemap_set_wb_err+0x70/0x70
[ 369.903710] __lock_page+0x44/0x50
[ 369.904504] truncate_inode_pages_range+0x463/0x8a0
[ 369.905648] ? find_get_pages_range_tag+0x7e/0x2d0
[ 369.906777] ? pagevec_lookup_range_tag+0x28/0x40
[ 369.907881] ? free_cpumask_var+0x9/0x10
[ 369.908816] ? mark_buffer_async_write+0x30/0x30
[ 369.909836] ? get_ksm_page+0xf0/0x210
[ 369.910738] ? __x64_sys_fsopen+0x160/0x160
[ 369.911743] ? free_cpumask_var+0x9/0x10
[ 369.912694] ? on_each_cpu_cond_mask+0xb1/0x130
[ 369.913763] truncate_inode_pages+0x15/0x20
[ 369.914747] __blkdev_put+0xa7/0x220
[ 369.915565] ? exit_mmap+0x121/0x1b0
[ 369.916407] blkdev_put+0x4e/0xe0
[ 369.917163] blkdev_close+0x26/0x30
[ 369.917960] __fput+0xcc/0x260
[ 369.918669] ____fput+0xe/0x10
[ 369.919362] task_work_run+0x8b/0xb0
[ 369.920185] do_exit+0x1ff/0x420
[ 369.920953] do_group_exit+0x3b/0xb0
[ 369.921776] get_signal+0x169/0x8b0
[ 369.922597] do_signal+0x2a/0x100
[ 369.923466] ? __vfs_read+0x29/0x40
[ 369.924424] ? vfs_read+0xaa/0x160
[ 369.925256] ? ksys_read+0x67/0xe0
[ 369.926085] prepare_exit_to_usermode+0x12b/0x1a0
[ 369.927204] do_syscall_64+0x8e/0x100
[ 369.928063] entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[ 369.929238] RIP: 0033:0x7f6d466de8c2
[ 369.930056] Code: Bad RIP value.
[ 369.930799] RSP: 002b:00007fffffadb518 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 369.932496] RAX: fffffffffffffffc RBX: 000055b0b3c3dd30 RCX: 00007f6d466de8c2
[ 369.934119] RDX: 0000000000000040 RSI: 000055b0b3c3dd58 RDI: 000000000000000e
[ 369.935700] RBP: 000055b0b3c3dbe0 R08: 000055b0b3c3dd30 R09: 0000000000000001
[ 369.937300] R10: 0000000000000001 R11: 0000000000000246 R12: 00000002e6df0000
[ 369.938873] R13: 0000000000000040 R14: 000055b0b3c3dc30 R15: 000055b0b3c3dd48

Thanks,
Sherry