Re: [PATCH V2 3/3] block: model freeze & enter queue as lock for supporting lockdep

From: Marek Szyprowski
Date: Tue Oct 29 2024 - 07:13:59 EST


On 25.10.2024 02:37, Ming Lei wrote:
> Recently we got several deadlock report[1][2][3] caused by
> blk_mq_freeze_queue and blk_enter_queue().
>
> Turns out the two are just like acquiring read/write lock, so model them
> as read/write lock for supporting lockdep:
>
> 1) model q->q_usage_counter as two locks(io and queue lock)
>
> - queue lock covers sync with blk_enter_queue()
>
> - io lock covers sync with bio_enter_queue()
>
> 2) make the lockdep class/key as per-queue:
>
> - different subsystem has very different lock use pattern, shared lock
> class causes false positive easily
>
> - freeze_queue degrades to no lock in case that disk state becomes DEAD
> because bio_enter_queue() won't be blocked any more
>
> - freeze_queue degrades to no lock in case that request queue becomes dying
> because blk_enter_queue() won't be blocked any more
>
> 3) model blk_mq_freeze_queue() as acquire_exclusive & try_lock
> - it is exclusive lock, so dependency with blk_enter_queue() is covered
>
> - it is trylock because blk_mq_freeze_queue() are allowed to run
> concurrently
>
> 4) model blk_enter_queue() & bio_enter_queue() as acquire_read()
> - nested blk_enter_queue() are allowed
>
> - dependency with blk_mq_freeze_queue() is covered
>
> - blk_queue_exit() is often called from other contexts(such as irq), and
> it can't be annotated as lock_release(), so simply do it in
> blk_enter_queue(), this way still covered cases as many as possible
>
> With lockdep support, such kind of reports may be reported asap and
> needn't wait until the real deadlock is triggered.
>
> For example, lockdep report can be triggered in the report[3] with this
> patch applied.
>
> [1] occasional block layer hang when setting 'echo noop > /sys/block/sda/queue/scheduler'
> https://bugzilla.kernel.org/show_bug.cgi?id=219166
>
> [2] del_gendisk() vs blk_queue_enter() race condition
> https://lore.kernel.org/linux-block/20241003085610.GK11458@xxxxxxxxxx/
>
> [3] queue_freeze & queue_enter deadlock in scsi
> https://lore.kernel.org/linux-block/ZxG38G9BuFdBpBHZ@fedora/T/#u
>
> Reviewed-by: Christoph Hellwig <hch@xxxxxx>
> Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>

This patch landed yesterday in linux-next as commit f1be1788a32e
("block: model freeze & enter queue as lock for supporting lockdep"). 
In my tests I found that it introduces the following 2 lockdep warnings:

1. On Samsung Exynos 4412-based Odroid U3 board (ARM 32bit), observed
when booting it:

======================================================
WARNING: possible circular locking dependency detected
6.12.0-rc4-00037-gf1be1788a32e #9290 Not tainted
------------------------------------------------------
find/1284 is trying to acquire lock:
cf3b8534 (&mm->mmap_lock){++++}-{3:3}, at: __might_fault+0x30/0x70

but task is already holding lock:
c203a0c8 (&sb->s_type->i_mutex_key#2){++++}-{3:3}, at:
iterate_dir+0x30/0x140

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #4 (&sb->s_type->i_mutex_key#2){++++}-{3:3}:
       down_write+0x44/0xc4
       start_creating+0x8c/0x170
       debugfs_create_dir+0x1c/0x178
       blk_register_queue+0xa0/0x1c0
       add_disk_fwnode+0x210/0x434
       brd_alloc+0x1cc/0x210
       brd_init+0xac/0x104
       do_one_initcall+0x64/0x30c
       kernel_init_freeable+0x1c4/0x228
       kernel_init+0x1c/0x12c
       ret_from_fork+0x14/0x28

-> #3 (&q->debugfs_mutex){+.+.}-{3:3}:
       __mutex_lock+0x94/0x94c
       mutex_lock_nested+0x1c/0x24
       blk_mq_init_sched+0x140/0x204
       elevator_init_mq+0xb8/0x130
       add_disk_fwnode+0x3c/0x434
       mmc_blk_alloc_req+0x34c/0x464
       mmc_blk_probe+0x1f4/0x6f8
       really_probe+0xe0/0x3d8
       __driver_probe_device+0x9c/0x1e4
       driver_probe_device+0x30/0xc0
       __device_attach_driver+0xa8/0x120
       bus_for_each_drv+0x80/0xcc
       __device_attach+0xac/0x1fc
       bus_probe_device+0x8c/0x90
       device_add+0x5a4/0x7cc
       mmc_add_card+0x118/0x2c8
       mmc_attach_mmc+0xd8/0x174
       mmc_rescan+0x2ec/0x3a8
       process_one_work+0x240/0x6d0
       worker_thread+0x1a0/0x398
       kthread+0x104/0x138
       ret_from_fork+0x14/0x28

-> #2 (&q->q_usage_counter(io)#17){++++}-{0:0}:
       blk_mq_submit_bio+0x8dc/0xb34
       __submit_bio+0x78/0x148
       submit_bio_noacct_nocheck+0x204/0x32c
       ext4_mpage_readpages+0x558/0x7b0
       read_pages+0x64/0x28c
       page_cache_ra_unbounded+0x118/0x1bc
       filemap_get_pages+0x124/0x7ec
       filemap_read+0x174/0x5b0
       __kernel_read+0x128/0x2c0
       bprm_execve+0x230/0x7a4
       kernel_execve+0xec/0x194
       try_to_run_init_process+0xc/0x38
       kernel_init+0xdc/0x12c
       ret_from_fork+0x14/0x28

-> #1 (mapping.invalidate_lock#2){++++}-{3:3}:
       down_read+0x44/0x224
       filemap_fault+0x648/0x10f0
       __do_fault+0x38/0xd4
       handle_mm_fault+0xaf8/0x14d0
       do_page_fault+0xe0/0x5c8
       do_DataAbort+0x3c/0xb0
       __dabt_svc+0x50/0x80
       __clear_user_std+0x34/0x68
       elf_load+0x1a8/0x208
       load_elf_binary+0x3f4/0x13cc
       bprm_execve+0x28c/0x7a4
       do_execveat_common+0x150/0x198
       sys_execve+0x30/0x38
       ret_fast_syscall+0x0/0x1c

-> #0 (&mm->mmap_lock){++++}-{3:3}:
       __lock_acquire+0x158c/0x2970
       lock_acquire+0x130/0x384
       __might_fault+0x50/0x70
       filldir64+0x94/0x28c
       dcache_readdir+0x174/0x260
       iterate_dir+0x64/0x140
       sys_getdents64+0x60/0x130
       ret_fast_syscall+0x0/0x1c

other info that might help us debug this:

Chain exists of:
  &mm->mmap_lock --> &q->debugfs_mutex --> &sb->s_type->i_mutex_key#2

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  rlock(&sb->s_type->i_mutex_key#2);
                               lock(&q->debugfs_mutex);
lock(&sb->s_type->i_mutex_key#2);
  rlock(&mm->mmap_lock);

 *** DEADLOCK ***

2 locks held by find/1284:
 #0: c3df1e88 (&f->f_pos_lock){+.+.}-{3:3}, at: fdget_pos+0x88/0xd0
 #1: c203a0c8 (&sb->s_type->i_mutex_key#2){++++}-{3:3}, at:
iterate_dir+0x30/0x140

stack backtrace:
CPU: 1 UID: 0 PID: 1284 Comm: find Not tainted
6.12.0-rc4-00037-gf1be1788a32e #9290
Hardware name: Samsung Exynos (Flattened Device Tree)
Call trace:
 unwind_backtrace from show_stack+0x10/0x14
 show_stack from dump_stack_lvl+0x68/0x88
 dump_stack_lvl from print_circular_bug+0x31c/0x394
 print_circular_bug from check_noncircular+0x16c/0x184
 check_noncircular from __lock_acquire+0x158c/0x2970
 __lock_acquire from lock_acquire+0x130/0x384
 lock_acquire from __might_fault+0x50/0x70
 __might_fault from filldir64+0x94/0x28c
 filldir64 from dcache_readdir+0x174/0x260
 dcache_readdir from iterate_dir+0x64/0x140
 iterate_dir from sys_getdents64+0x60/0x130
 sys_getdents64 from ret_fast_syscall+0x0/0x1c
Exception stack(0xf22b5fa8 to 0xf22b5ff0)
5fa0:                   004b4fa0 004b4f80 00000004 004b4fa0 00008000
00000000
5fc0: 004b4fa0 004b4f80 00000001 000000d9 00000000 004b4af0 00000000
000010ea
5fe0: 004b1eb4 bea05af0 b6da4b08 b6da4a28

--->8---


2. On QEMU's ARM64 virt machine, observed during system suspend/resume
cycle:

# time rtcwake -s10 -mmem
rtcwake: wakeup from "mem" using /dev/rtc0 at Tue Oct 29 11:54:30 2024
PM: suspend entry (s2idle)
Filesystems sync: 0.004 seconds
Freezing user space processes
Freezing user space processes completed (elapsed 0.007 seconds)
OOM killer disabled.
Freezing remaining freezable tasks
Freezing remaining freezable tasks completed (elapsed 0.004 seconds)

======================================================
WARNING: possible circular locking dependency detected
6.12.0-rc4+ #9291 Not tainted
------------------------------------------------------
rtcwake/1299 is trying to acquire lock:
ffff80008358a7f8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x1c/0x28

but task is already holding lock:
ffff000006136d68 (&q->q_usage_counter(io)#5){++++}-{0:0}, at:
virtblk_freeze+0x24/0x60

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #3 (&q->q_usage_counter(io)#5){++++}-{0:0}:
       blk_mq_submit_bio+0x7c0/0x9d8
       __submit_bio+0x74/0x164
       submit_bio_noacct_nocheck+0x2d4/0x3b4
       submit_bio_noacct+0x148/0x3fc
       submit_bio+0x130/0x204
       submit_bh_wbc+0x148/0x1bc
       submit_bh+0x18/0x24
       ext4_read_bh_nowait+0x70/0x100
       ext4_sb_breadahead_unmovable+0x50/0x80
       __ext4_get_inode_loc+0x354/0x654
       ext4_get_inode_loc+0x40/0xa8
       ext4_reserve_inode_write+0x40/0xf0
       __ext4_mark_inode_dirty+0x88/0x300
       ext4_ext_tree_init+0x40/0x4c
       __ext4_new_inode+0x99c/0x1614
       ext4_create+0xe4/0x1d4
       lookup_open.isra.0+0x47c/0x540
       path_openat+0x370/0x9e8
       do_filp_open+0x80/0x130
       do_sys_openat2+0xb4/0xe8
       __arm64_compat_sys_openat+0x5c/0xa4
       invoke_syscall+0x48/0x110
       el0_svc_common.constprop.0+0x40/0xe8
       do_el0_svc_compat+0x20/0x3c
       el0_svc_compat+0x44/0xe0
       el0t_32_sync_handler+0x98/0x148
       el0t_32_sync+0x194/0x198

-> #2 (jbd2_handle){++++}-{0:0}:
       start_this_handle+0x178/0x4e8
       jbd2__journal_start+0x110/0x298
       __ext4_journal_start_sb+0x9c/0x274
       ext4_dirty_inode+0x38/0x88
       __mark_inode_dirty+0x90/0x568
       generic_update_time+0x50/0x64
       touch_atime+0x2c0/0x324
       ext4_file_mmap+0x68/0x88
       mmap_region+0x448/0xa38
       do_mmap+0x3dc/0x540
       vm_mmap_pgoff+0xf8/0x1b4
       ksys_mmap_pgoff+0x148/0x1f0
       __arm64_compat_sys_aarch32_mmap2+0x20/0x2c
       invoke_syscall+0x48/0x110
       el0_svc_common.constprop.0+0x40/0xe8
       do_el0_svc_compat+0x20/0x3c
       el0_svc_compat+0x44/0xe0
       el0t_32_sync_handler+0x98/0x148
       el0t_32_sync+0x194/0x198

-> #1 (&mm->mmap_lock){++++}-{3:3}:
       __might_fault+0x5c/0x80
       inet_gifconf+0xcc/0x278
       dev_ifconf+0xc4/0x1f8
       sock_ioctl+0x234/0x384
       compat_sock_ioctl+0x180/0x35c
       __arm64_compat_sys_ioctl+0x14c/0x16c
       invoke_syscall+0x48/0x110
       el0_svc_common.constprop.0+0x40/0xe8
       do_el0_svc_compat+0x20/0x3c
       el0_svc_compat+0x44/0xe0
       el0t_32_sync_handler+0x98/0x148
       el0t_32_sync+0x194/0x198

-> #0 (rtnl_mutex){+.+.}-{3:3}:
       __lock_acquire+0x1374/0x2224
       lock_acquire+0x200/0x340
       __mutex_lock+0x98/0x428
       mutex_lock_nested+0x24/0x30
       rtnl_lock+0x1c/0x28
       virtnet_freeze_down.isra.0+0x20/0x9c
       virtnet_freeze+0x44/0x60
       virtio_device_freeze+0x68/0x94
       virtio_mmio_freeze+0x14/0x20
       platform_pm_suspend+0x2c/0x6c
       dpm_run_callback+0xa4/0x218
       device_suspend+0x12c/0x52c
       dpm_suspend+0x10c/0x2e4
       dpm_suspend_start+0x70/0x78
       suspend_devices_and_enter+0x130/0x798
       pm_suspend+0x1ac/0x2e8
       state_store+0x8c/0x110
       kobj_attr_store+0x18/0x2c
       sysfs_kf_write+0x4c/0x78
       kernfs_fop_write_iter+0x120/0x1b4
       vfs_write+0x2b0/0x35c
       ksys_write+0x68/0xf4
       __arm64_sys_write+0x1c/0x28
       invoke_syscall+0x48/0x110
       el0_svc_common.constprop.0+0x40/0xe8
       do_el0_svc_compat+0x20/0x3c
       el0_svc_compat+0x44/0xe0
       el0t_32_sync_handler+0x98/0x148
       el0t_32_sync+0x194/0x198

other info that might help us debug this:

Chain exists of:
  rtnl_mutex --> jbd2_handle --> &q->q_usage_counter(io)#5

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&q->q_usage_counter(io)#5);
                               lock(jbd2_handle);
lock(&q->q_usage_counter(io)#5);
  lock(rtnl_mutex);

 *** DEADLOCK ***

9 locks held by rtcwake/1299:
 #0: ffff0000069103f8 (sb_writers#7){.+.+}-{0:0}, at: vfs_write+0x1e4/0x35c
 #1: ffff000007c0ae88 (&of->mutex#2){+.+.}-{3:3}, at:
kernfs_fop_write_iter+0xf0/0x1b4
 #2: ffff0000046906e8 (kn->active#24){.+.+}-{0:0}, at:
kernfs_fop_write_iter+0xf8/0x1b4
 #3: ffff800082fd9908 (system_transition_mutex){+.+.}-{3:3}, at:
pm_suspend+0x88/0x2e8
 #4: ffff000006137678 (&q->q_usage_counter(io)#6){++++}-{0:0}, at:
virtblk_freeze+0x24/0x60
 #5: ffff0000061376b0 (&q->q_usage_counter(queue)#2){++++}-{0:0}, at:
virtblk_freeze+0x24/0x60
 #6: ffff000006136d68 (&q->q_usage_counter(io)#5){++++}-{0:0}, at:
virtblk_freeze+0x24/0x60
 #7: ffff000006136da0 (&q->q_usage_counter(queue)){++++}-{0:0}, at:
virtblk_freeze+0x24/0x60
 #8: ffff0000056208f8 (&dev->mutex){....}-{3:3}, at:
device_suspend+0xf8/0x52c

stack backtrace:
CPU: 0 UID: 0 PID: 1299 Comm: rtcwake Not tainted 6.12.0-rc4+ #9291
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace+0x94/0xec
 show_stack+0x18/0x24
 dump_stack_lvl+0x90/0xd0
 dump_stack+0x18/0x24
 print_circular_bug+0x298/0x37c
 check_noncircular+0x15c/0x170
 __lock_acquire+0x1374/0x2224
 lock_acquire+0x200/0x340
 __mutex_lock+0x98/0x428
 mutex_lock_nested+0x24/0x30
 rtnl_lock+0x1c/0x28
 virtnet_freeze_down.isra.0+0x20/0x9c
 virtnet_freeze+0x44/0x60
 virtio_device_freeze+0x68/0x94
 virtio_mmio_freeze+0x14/0x20
 platform_pm_suspend+0x2c/0x6c
 dpm_run_callback+0xa4/0x218
 device_suspend+0x12c/0x52c
 dpm_suspend+0x10c/0x2e4
 dpm_suspend_start+0x70/0x78
 suspend_devices_and_enter+0x130/0x798
 pm_suspend+0x1ac/0x2e8
 state_store+0x8c/0x110
 kobj_attr_store+0x18/0x2c
 sysfs_kf_write+0x4c/0x78
 kernfs_fop_write_iter+0x120/0x1b4
 vfs_write+0x2b0/0x35c
 ksys_write+0x68/0xf4
 __arm64_sys_write+0x1c/0x28
 invoke_syscall+0x48/0x110
 el0_svc_common.constprop.0+0x40/0xe8
 do_el0_svc_compat+0x20/0x3c
 el0_svc_compat+0x44/0xe0
 el0t_32_sync_handler+0x98/0x148
 el0t_32_sync+0x194/0x198
virtio_blk virtio2: 1/0/0 default/read/poll queues
virtio_blk virtio3: 1/0/0 default/read/poll queues
OOM killer enabled.
Restarting tasks ... done.
random: crng reseeded on system resumption
PM: suspend exit

--->8---

Let me know if You need more information about my test systems.

> ---
> block/blk-core.c | 18 ++++++++++++++++--
> block/blk-mq.c | 26 ++++++++++++++++++++++----
> block/blk.h | 29 ++++++++++++++++++++++++++---
> block/genhd.c | 15 +++++++++++----
> include/linux/blkdev.h | 6 ++++++
> 5 files changed, 81 insertions(+), 13 deletions(-)

> ...

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland