Re: [RFC PATCH v2 2/2] blk-mq: Lockout tagset iter when freeing rqs
From: John Garry
Date: Wed Dec 23 2020 - 06:43:32 EST
- ppvk@xxxxxxxxxxxxxx
Are there any blk_mq_tagset_busy_iter() calls that happen from a
context where the tag set can disappear while that function is in
progress?
So isn't the blk_mq_tag_set always a member of the host driver data for
those cases, and, since blk_mq_tagset_busy_iter() is for iter'ing block
driver tags and called from block driver or hctx_busy_show(), it would
exist for the lifetime of the host device.
Some blk_mq_tagset_busy_iter() calls happen from a context where it is
not allowed to sleep but also where it is guaranteed that the tag set
won't disappear, e.g. the call from inside sdk_mq_queue_rq().
You're talking about skd_mq_queue_rq() -> skd_in_flight() ->
blk_mq_tagset_busy_iter(), right?
So I would expect any .queue_rq calls to complete before the associated
request queue and tagset may be unregistered.
How about using a mutex inside blk_mq_queue_tag_busy_iter() instead?
As far as I can see all blk_mq_queue_tag_busy_iter() happen from a
context where it is allowed to sleep.
Well then it seems sensible to add might_sleep() also.
And we still have the blk_mq_queue_tag_busy_iter() problem. As Ming
mentioned yesterday, we know contexts where from where it is called
which may not sleep.
Sorry, I got the 2x iter functions mixed up.
So if we use mutex to solve blk_mq_queue_tag_busy_iter() problem, then
we still have this issue in blk_mq_tagset_busy_iter() which I report
previously [0]:
[ 319.771745] BUG: KASAN: use-after-free in bt_tags_iter+0xe0/0x128
[ 319.777832] Read of size 4 at addr ffff0010b6bd27cc by task more/1866
[ 319.784262]
[ 319.785753] CPU: 61 PID: 1866 Comm: more Tainted: G W
5.10.0-rc4-18118-gaa7b9c30d8ff #1070
[ 319.795312] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon
D05 IT21 Nemo 2.0 RC0 04/18/2018
[ 319.804437] Call trace:
[ 319.806892] dump_backtrace+0x0/0x2d0
[ 319.810552] show_stack+0x18/0x68
[ 319.813865] dump_stack+0x100/0x16c
[ 319.817348] print_address_description.constprop.12+0x6c/0x4e8
[ 319.823176] kasan_report+0x130/0x200
[ 319.826831] __asan_load4+0x9c/0xd8
[ 319.830315] bt_tags_iter+0xe0/0x128
[ 319.833884] __blk_mq_all_tag_iter+0x320/0x3a8
[ 319.838320] blk_mq_tagset_busy_iter+0x8c/0xd8
[ 319.842760] scsi_host_busy+0x88/0xb8
[ 319.846418] show_host_busy+0x1c/0x48
[ 319.850079] dev_attr_show+0x44/0x90
[ 319.853655] sysfs_kf_seq_show+0x128/0x1c8
[ 319.857744] kernfs_seq_show+0xa0/0xb8
[ 319.861489] seq_read_iter+0x1ec/0x6a0
[ 319.865230] seq_read+0x1d0/0x250
[ 319.868539] kernfs_fop_read+0x70/0x330
[ 319.872369] vfs_read+0xe4/0x250
[ 319.875590] ksys_read+0xc8/0x178
[ 319.878898] __arm64_sys_read+0x44/0x58
[ 319.882730] el0_svc_common.constprop.2+0xc4/0x1e8
[ 319.887515] do_el0_svc+0x90/0xa0
[ 319.890824] el0_sync_handler+0x128/0x178
[ 319.894825] el0_sync+0x158/0x180
[ 319.898131]
[ 319.899614] The buggy address belongs to the page:
[ 319.904403] page:000000004e9e6864 refcount:0 mapcount:0
mapping:0000000000000000 index:0x0 pfn:0x10b6bd2
[ 319.913876] flags: 0xbfffc0000000000()
[ 319.917626] raw: 0bfffc0000000000 0000000000000000 fffffe0000000000
0000000000000000
[ 319.925363] raw: 0000000000000000 0000000000000000 00000000ffffffff
0000000000000000
[ 319.933096] page dumped because: kasan: bad access detected
[ 319.938658]
[ 319.940141] Memory state around the buggy address:
[ 319.944925] ffff0010b6bd2680: ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff
[ 319.952139] ffff0010b6bd2700: ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff
[ 319.959354] >ffff0010b6bd2780: ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff
[ 319.966566] ^
[ 319.972131] ffff0010b6bd2800: ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff
[ 319.979344] ffff0010b6bd2880: ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff
[ 319.986557]
==================================================================
[ 319.993770] Disabling lock debugging due to kernel taint
So to trigger this, I start fio on a disk, and then have one script
which constantly enables and disables an IO scheduler for that disk, and
another script which constantly reads /sys/class/scsi_host/host0/host_busy .
And in this problem, the driver tag we iterate may point to a stale IO
sched request.
Thanks,
John
[0]
https://lore.kernel.org/linux-block/9d4124ea-dbab-41cf-63bd-b17ef3e5037a@xxxxxxxxxx/