Re: [PATCH 3/3] blk-mq: Fix the queue freezing mechanism

From: Bart Van Assche
Date: Thu Sep 24 2015 - 14:09:47 EST


On 09/24/2015 10:49 AM, Tejun Heo wrote:
> Again, that doesn't happen.

Hello Tejun,

In case anyone would be interested, the backtraces for the lockup I had
observed are as follows:

sysrq: SysRq : Show Blocked State
task PC stack pid father
kworker/4:0 D ffff88045c5d5a00 0 29 2 0x00000000
Workqueue: srp_remove srp_remove_work [ib_srp]
ffff88045c767c08 0000000000000086 ffffffff810ba11d ffff88047fd15ad8
ffff88045c5d5a00 ffff88045c768000 ffff88045c768000 ffff88041c737ab8
ffff880036f7bcf8 ffff8804158b55e8 0000000000000100 ffff88045c767c20
Call Trace:
[<ffffffff810ba11d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff814effea>] schedule+0x3a/0x90
[<ffffffff81271a36>] blk_mq_freeze_queue_wait+0x56/0xb0
[<ffffffff810b4620>] ? prepare_to_wait_event+0xf0/0xf0
[<ffffffff81273b64>] blk_mq_update_tag_set_depth+0x44/0xb0
[<ffffffff81275620>] blk_mq_free_queue+0x50/0x110
[<ffffffff81266468>] blk_cleanup_queue+0x148/0x240
[<ffffffffa001aaf5>] __scsi_remove_device+0x65/0xd0 [scsi_mod]
[<ffffffffa0019249>] scsi_forget_host+0x69/0x70 [scsi_mod]
[<ffffffffa000d292>] scsi_remove_host+0x82/0x130 [scsi_mod]
[<ffffffffa02a5c80>] srp_remove_work+0x90/0x1f0 [ib_srp]
[<ffffffff8108d9b8>] process_one_work+0x1d8/0x610
[<ffffffff8108d92b>] ? process_one_work+0x14b/0x610
[<ffffffff8108df04>] worker_thread+0x114/0x460
[<ffffffff8108ddf0>] ? process_one_work+0x610/0x610
[<ffffffff810941f8>] kthread+0xf8/0x110
[<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
[<ffffffff814f5eaf>] ret_from_fork+0x3f/0x70
[<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
kworker/1:2 D ffff88045c648000 0 264 2 0x00000000
Workqueue: srp_remove srp_remove_work [ib_srp]
ffff880444d47c08 0000000000000086 ffffffff810ba11d ffff88047fc55ad8
ffff88045c648000 ffff880450a9da00 ffff880444d48000 ffff88042826c8d8
ffff880428969790 ffff880413b27b50 0000000000000040 ffff880444d47c20
Call Trace:
[<ffffffff810ba11d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff814effea>] schedule+0x3a/0x90
[<ffffffff81271a36>] blk_mq_freeze_queue_wait+0x56/0xb0
[<ffffffff810b4620>] ? prepare_to_wait_event+0xf0/0xf0
[<ffffffff81273b64>] blk_mq_update_tag_set_depth+0x44/0xb0
[<ffffffff81275620>] blk_mq_free_queue+0x50/0x110
[<ffffffff81266468>] blk_cleanup_queue+0x148/0x240
[<ffffffffa001aaf5>] __scsi_remove_device+0x65/0xd0 [scsi_mod]
[<ffffffffa0019249>] scsi_forget_host+0x69/0x70 [scsi_mod]
[<ffffffffa000d292>] scsi_remove_host+0x82/0x130 [scsi_mod]
[<ffffffffa02a5c80>] srp_remove_work+0x90/0x1f0 [ib_srp]
[<ffffffff8108d9b8>] process_one_work+0x1d8/0x610
[<ffffffff8108d92b>] ? process_one_work+0x14b/0x610
[<ffffffff8108df04>] worker_thread+0x114/0x460
[<ffffffff8108ddf0>] ? process_one_work+0x610/0x610
[<ffffffff810941f8>] kthread+0xf8/0x110
[<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
[<ffffffff814f5eaf>] ret_from_fork+0x3f/0x70
[<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
kworker/7:0 D ffff88045c675a00 0 27179 2 0x00000000
Workqueue: srp_remove srp_remove_work [ib_srp]
ffff8803f7333c08 0000000000000086 ffffffff810ba11d ffff88047fdd5ad8
ffff88045c675a00 ffff8803f7139680 ffff8803f7334000 ffff880403d76e40
ffff88040c412408 ffff8803fe493cf8 00000000000001c0 ffff8803f7333c20
Call Trace:
[<ffffffff810ba11d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff814effea>] schedule+0x3a/0x90
[<ffffffff81271a36>] blk_mq_freeze_queue_wait+0x56/0xb0
[<ffffffff810b4620>] ? prepare_to_wait_event+0xf0/0xf0
[<ffffffff81273b64>] blk_mq_update_tag_set_depth+0x44/0xb0
[<ffffffff81275620>] blk_mq_free_queue+0x50/0x110
[<ffffffff81266468>] blk_cleanup_queue+0x148/0x240
[<ffffffffa001aaf5>] __scsi_remove_device+0x65/0xd0 [scsi_mod]
[<ffffffffa0019249>] scsi_forget_host+0x69/0x70 [scsi_mod]
[<ffffffffa000d292>] scsi_remove_host+0x82/0x130 [scsi_mod]
[<ffffffffa02a5c80>] srp_remove_work+0x90/0x1f0 [ib_srp]
[<ffffffff8108d9b8>] process_one_work+0x1d8/0x610
[<ffffffff8108d92b>] ? process_one_work+0x14b/0x610
[<ffffffff8108df04>] worker_thread+0x114/0x460
[<ffffffff8108ddf0>] ? process_one_work+0x610/0x610
[<ffffffff810941f8>] kthread+0xf8/0x110
[<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
[<ffffffff814f5eaf>] ret_from_fork+0x3f/0x70
[<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
kworker/3:0 D ffff88045c649680 0 21529 2 0x00000000
Workqueue: srp_remove srp_remove_work [ib_srp]
ffff880392f2fc08 0000000000000086 ffffffff810ba11d ffff88047fcd5ad8
ffff88045c649680 ffff88039a72c380 ffff880392f30000 ffff880420f2fab8
ffff880425896ed8 ffff880416043cf8 00000000000000c0 ffff880392f2fc20
Call Trace:
[<ffffffff810ba11d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff814effea>] schedule+0x3a/0x90
[<ffffffff81271a36>] blk_mq_freeze_queue_wait+0x56/0xb0
[<ffffffff810b4620>] ? prepare_to_wait_event+0xf0/0xf0
[<ffffffff81273b64>] blk_mq_update_tag_set_depth+0x44/0xb0
[<ffffffff81275620>] blk_mq_free_queue+0x50/0x110
[<ffffffff81266468>] blk_cleanup_queue+0x148/0x240
[<ffffffffa001aaf5>] __scsi_remove_device+0x65/0xd0 [scsi_mod]
[<ffffffffa0019249>] scsi_forget_host+0x69/0x70 [scsi_mod]
[<ffffffffa000d292>] scsi_remove_host+0x82/0x130 [scsi_mod]
[<ffffffffa02a5c80>] srp_remove_work+0x90/0x1f0 [ib_srp]
[<ffffffff8108d9b8>] process_one_work+0x1d8/0x610
[<ffffffff8108d92b>] ? process_one_work+0x14b/0x610
[<ffffffff8108df04>] worker_thread+0x114/0x460
[<ffffffff8108ddf0>] ? process_one_work+0x610/0x610
[<ffffffff810941f8>] kthread+0xf8/0x110
[<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
[<ffffffff814f5eaf>] ret_from_fork+0x3f/0x70
[<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
kworker/5:0 D ffff88045c64ad00 0 10950 2 0x00000000
Workqueue: srp_remove srp_remove_work [ib_srp]
ffff88037111bc08 0000000000000086 ffffffff810ba11d ffff88047fd55ad8
ffff88045c64ad00 ffff880055991680 ffff88037111c000 ffff88041ff348d8
ffff880424463080 ffff8804145155e8 0000000000000140 ffff88037111bc20
Call Trace:
[<ffffffff810ba11d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff814effea>] schedule+0x3a/0x90
[<ffffffff81271a36>] blk_mq_freeze_queue_wait+0x56/0xb0
[<ffffffff810b4620>] ? prepare_to_wait_event+0xf0/0xf0
[<ffffffff81273b64>] blk_mq_update_tag_set_depth+0x44/0xb0
[<ffffffff81275620>] blk_mq_free_queue+0x50/0x110
[<ffffffff81266468>] blk_cleanup_queue+0x148/0x240
[<ffffffffa001aaf5>] __scsi_remove_device+0x65/0xd0 [scsi_mod]
[<ffffffffa0019249>] scsi_forget_host+0x69/0x70 [scsi_mod]
[<ffffffffa000d292>] scsi_remove_host+0x82/0x130 [scsi_mod]
[<ffffffffa02a5c80>] srp_remove_work+0x90/0x1f0 [ib_srp]
[<ffffffff8108d9b8>] process_one_work+0x1d8/0x610
[<ffffffff8108d92b>] ? process_one_work+0x14b/0x610
[<ffffffff8108df04>] worker_thread+0x114/0x460
[<ffffffff8108ddf0>] ? process_one_work+0x610/0x610
[<ffffffff810941f8>] kthread+0xf8/0x110
[<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
[<ffffffff814f5eaf>] ret_from_fork+0x3f/0x70
[<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
kworker/6:1 D ffff88045c670000 0 17446 2 0x00000000
Workqueue: srp_remove srp_remove_work [ib_srp]
ffff880052effc08 0000000000000086 ffffffff810ba11d ffff88047fd95ad8
ffff88045c670000 ffff88035e401680 ffff880052f00000 ffff8803ecf8ee40
ffff8803eed58b18 ffff880067a88b18 0000000000000180 ffff880052effc20
Call Trace:
[<ffffffff810ba11d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff814effea>] schedule+0x3a/0x90
[<ffffffff81271a36>] blk_mq_freeze_queue_wait+0x56/0xb0
[<ffffffff810b4620>] ? prepare_to_wait_event+0xf0/0xf0
[<ffffffff81273b64>] blk_mq_update_tag_set_depth+0x44/0xb0
[<ffffffff81275620>] blk_mq_free_queue+0x50/0x110
[<ffffffff81266468>] blk_cleanup_queue+0x148/0x240
[<ffffffffa001aaf5>] __scsi_remove_device+0x65/0xd0 [scsi_mod]
[<ffffffffa0019249>] scsi_forget_host+0x69/0x70 [scsi_mod]
[<ffffffffa000d292>] scsi_remove_host+0x82/0x130 [scsi_mod]
[<ffffffffa02a5c80>] srp_remove_work+0x90/0x1f0 [ib_srp]
[<ffffffff8108d9b8>] process_one_work+0x1d8/0x610
[<ffffffff8108d92b>] ? process_one_work+0x14b/0x610
[<ffffffff8108df04>] worker_thread+0x114/0x460
[<ffffffff8108ddf0>] ? process_one_work+0x610/0x610
[<ffffffff810941f8>] kthread+0xf8/0x110
[<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
[<ffffffff814f5eaf>] ret_from_fork+0x3f/0x70
[<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
kworker/0:0 D ffff88035d205a00 0 23356 2 0x00000000
Workqueue: srp_remove srp_remove_work [ib_srp]
ffff88035ec83c08 0000000000000086 ffffffff810ba11d ffff88047fc15ad8
ffff88035d205a00 ffff880360a3ad00 ffff88035ec84000 ffff8803e75a61c8
ffff8803eed59790 ffff8803e4c10b18 0000000000000000 ffff88035ec83c20
Call Trace:
[<ffffffff810ba11d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff814effea>] schedule+0x3a/0x90
[<ffffffff81271a36>] blk_mq_freeze_queue_wait+0x56/0xb0
[<ffffffff810b4620>] ? prepare_to_wait_event+0xf0/0xf0
[<ffffffff81273b64>] blk_mq_update_tag_set_depth+0x44/0xb0
[<ffffffff81275620>] blk_mq_free_queue+0x50/0x110
[<ffffffff81266468>] blk_cleanup_queue+0x148/0x240
[<ffffffffa001aaf5>] __scsi_remove_device+0x65/0xd0 [scsi_mod]
[<ffffffffa0019249>] scsi_forget_host+0x69/0x70 [scsi_mod]
[<ffffffffa000d292>] scsi_remove_host+0x82/0x130 [scsi_mod]
[<ffffffffa02a5c80>] srp_remove_work+0x90/0x1f0 [ib_srp]
[<ffffffff8108d9b8>] process_one_work+0x1d8/0x610
[<ffffffff8108d92b>] ? process_one_work+0x14b/0x610
[<ffffffff8108df04>] worker_thread+0x114/0x460
[<ffffffff8108ddf0>] ? process_one_work+0x610/0x610
[<ffffffff810941f8>] kthread+0xf8/0x110
[<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
[<ffffffff814f5eaf>] ret_from_fork+0x3f/0x70
[<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
modprobe D ffff88045c64ad00 0 24137 17294 0x00000004
ffff8804146e7c48 0000000000000082 ffff880400000001 ffff88047fd55ad8
ffff88045c64ad00 ffff88045c5d0000 ffff8804146e8000 ffff8804146e7df0
ffff8804146e7de8 ffff88045c5d0000 ffff8804146e7dd0 ffff8804146e7c60
Call Trace:
[<ffffffff814effea>] schedule+0x3a/0x90
[<ffffffff814f4463>] schedule_timeout+0x1f3/0x290
[<ffffffff810b9f16>] ? mark_held_locks+0x66/0x90
[<ffffffff814f520c>] ? _raw_spin_unlock_irq+0x2c/0x40
[<ffffffff810ba042>] ? trace_hardirqs_on_caller+0x102/0x1d0
[<ffffffff814f1246>] wait_for_completion+0xd6/0x110
[<ffffffff8109ebd0>] ? wake_up_q+0x70/0x70
[<ffffffff8108a7cf>] flush_workqueue+0x1cf/0x6b0
[<ffffffff8108a605>] ? flush_workqueue+0x5/0x6b0
[<ffffffffa02a51a1>] srp_remove_one+0xc1/0x130 [ib_srp]
[<ffffffffa02b77ae>] ib_unregister_client+0x11e/0x1a0 [ib_core]
[<ffffffffa02aa471>] srp_cleanup_module+0x10/0xb9f [ib_srp]
[<ffffffff810f106f>] SyS_delete_module+0x16f/0x1f0
[<ffffffff81003017>] ? trace_hardirqs_on_thunk+0x17/0x19
[<ffffffff814f5af6>] entry_SYSCALL_64_fastpath+0x16/0x7a

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/