Re: Regression in 6.11-rc1 in scsi/sd?

From: Shinichiro Kawasaki
Date: Wed Jul 31 2024 - 22:05:40 EST


CC+: Christoph,

On Jul 31, 2024 / 13:22, Coelho, Luciano wrote:
> Hi,
>
> We're getting some lockdep splats with 6.11-rc1 in some of our older CI
> machines:
>
> <4>[ 25.357106] ======================================================
> <4>[ 25.358383] WARNING: possible circular locking dependency detected
> <4>[ 25.359636] 6.11.0-rc1-CI_DRM_15151-gb6f9528c7fff+ #1 Not tainted
> <4>[ 25.360902] ------------------------------------------------------
> <4>[ 25.362184] rc.local/609 is trying to acquire lock:
> <4>[ 25.363450] ffff888102358670 (&q->limits_lock){+.+.}-{3:3}, at: queue_max_discard_sectors_store+0x8e/0x110
> <4>[ 25.364798]
> but task is already holding lock:
> <4>[ 25.367410] ffff888102358550 (&q->sysfs_lock){+.+.}-{3:3}, at: queue_attr_store+0x45/0x90
> <4>[ 25.368773]
> which lock already depends on the new lock.
>
> ...during device probe. You can find the full dmesg, for example, here:
>
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_15151/fi-bsw-n3050/boot0.txt
>
> The stackdumps seem to point to sd_probe() and sd_revalidate_disk().
>
> Is this an known issue? Does anyone have any idea what is causing it?

I also observed a WARN quite similar, when I ran the blktests test case srp/002
on the kernel v6.11-rc1 [2]. I bisected and found the trigger commit is
804e498e0496 ("sd: convert to the atomic queue limits API"). The commit
introduced the pair of function calls to queue_limits_start_update() and
queue_limits_commit_update(). The functions lock and unlock the q->limits_lock.
I believe this caused the circular locking dependency.

I took a closer look, and found that the sd_read_cpr() is called between the
queue_limits_start_update() and queue_limits_commit_update() and it creates the
circular dependency (sd_read_cpr() is called in sd_revalidate_disk, and calls
disk_set_independent_access_ranges). I have created a fix candidate patch [1],
which moves the call out of the two queue_limits_*_update() function calls. I
observed this patch avoids the WARN on my test system. I will do some more
confirmation and send it out as a formal patch to get review on it.


[1]

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 049071b56819..7c5d681d234c 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3666,7 +3666,6 @@ static int sd_revalidate_disk(struct gendisk *disk)
sd_read_block_limits_ext(sdkp);
sd_read_block_characteristics(sdkp, &lim);
sd_zbc_read_zones(sdkp, &lim, buffer);
- sd_read_cpr(sdkp);
}

sd_print_capacity(sdkp, old_capacity);
@@ -3721,6 +3720,13 @@ static int sd_revalidate_disk(struct gendisk *disk)
if (err)
return err;

+ /*
+ * Query concurrent positioning ranges after releasing the limits_lock
+ * to avoid dead lock with sysfs_dir_lock and sysfs_lock.
+ */
+ if (sdkp->media_present && scsi_device_supports_vpd(sdp))
+ sd_read_cpr(sdkp);
+
/*
* For a zoned drive, revalidating the zones can be done only once
* the gendisk capacity is set. So if this fails, set back the gendisk


[2]

Jul 29 18:27:12 testnode1 kernel: WARNING: possible circular locking dependency detected
Jul 29 18:27:12 testnode1 kernel: 6.11.0-rc1 #208 Not tainted
Jul 29 18:27:12 testnode1 kernel: ------------------------------------------------------
Jul 29 18:27:12 testnode1 kernel: multipathd/118209 is trying to acquire lock:
Jul 29 18:27:12 testnode1 kernel: ffff888107094740 (&q->limits_lock){+.+.}-{3:3}, at: queue_max_sectors_store+0x12b/0x290
Jul 29 18:27:12 testnode1 kernel:
but task is already holding lock:
Jul 29 18:27:12 testnode1 kernel: ffff888107094620 (&q->sysfs_lock){+.+.}-{3:3}, at: queue_attr_store+0x8b/0x110
Jul 29 18:27:12 testnode1 kernel:
which lock already depends on the new lock.
Jul 29 18:27:12 testnode1 kernel:
the existing dependency chain (in reverse order) is:
Jul 29 18:27:12 testnode1 kernel:
-> #2 (&q->sysfs_lock){+.+.}-{3:3}:
Jul 29 18:27:12 testnode1 kernel: __mutex_lock+0x18b/0x1220
Jul 29 18:27:12 testnode1 kernel: blk_register_queue+0x10a/0x4a0
Jul 29 18:27:12 testnode1 kernel: device_add_disk+0x646/0x1010
Jul 29 18:27:12 testnode1 kernel: sd_probe+0x94e/0xf30
Jul 29 18:27:12 testnode1 kernel: really_probe+0x1e3/0x8a0
Jul 29 18:27:12 testnode1 kernel: __driver_probe_device+0x18c/0x370
Jul 29 18:27:12 testnode1 kernel: driver_probe_device+0x4a/0x120
Jul 29 18:27:12 testnode1 kernel: __device_attach_driver+0x15e/0x270
Jul 29 18:27:12 testnode1 kernel: bus_for_each_drv+0x114/0x1a0
Jul 29 18:27:12 testnode1 kernel: __device_attach_async_helper+0x19c/0x240
Jul 29 18:27:12 testnode1 kernel: async_run_entry_fn+0x96/0x4f0
Jul 29 18:27:12 testnode1 kernel: process_one_work+0x85a/0x1400
Jul 29 18:27:12 testnode1 kernel: worker_thread+0x5e2/0xfc0
Jul 29 18:27:12 testnode1 kernel: kthread+0x2d1/0x3a0
Jul 29 18:27:12 testnode1 kernel: ret_from_fork+0x30/0x70
Jul 29 18:27:12 testnode1 kernel: ret_from_fork_asm+0x1a/0x30
Jul 29 18:27:12 testnode1 kernel:
-> #1 (&q->sysfs_dir_lock){+.+.}-{3:3}:
Jul 29 18:27:12 testnode1 kernel: __mutex_lock+0x18b/0x1220
Jul 29 18:27:12 testnode1 kernel: disk_set_independent_access_ranges+0x5e/0x690
Jul 29 18:27:12 testnode1 kernel: sd_revalidate_disk.isra.0+0x5872/0x8c70
Jul 29 18:27:12 testnode1 kernel: sd_open+0x33a/0x490
Jul 29 18:27:12 testnode1 kernel: blkdev_get_whole+0x92/0x200
Jul 29 18:27:12 testnode1 kernel: bdev_open+0x640/0xd20
Jul 29 18:27:12 testnode1 kernel: bdev_file_open_by_dev+0xc2/0x150
Jul 29 18:27:12 testnode1 kernel: disk_scan_partitions+0x18c/0x290
Jul 29 18:27:12 testnode1 kernel: device_add_disk+0xceb/0x1010
Jul 29 18:27:12 testnode1 kernel: sd_probe+0x94e/0xf30
Jul 29 18:27:12 testnode1 kernel: really_probe+0x1e3/0x8a0
Jul 29 18:27:12 testnode1 kernel: __driver_probe_device+0x18c/0x370
Jul 29 18:27:12 testnode1 kernel: driver_probe_device+0x4a/0x120
Jul 29 18:27:12 testnode1 kernel: __device_attach_driver+0x15e/0x270
Jul 29 18:27:12 testnode1 kernel: bus_for_each_drv+0x114/0x1a0
Jul 29 18:27:12 testnode1 kernel: __device_attach_async_helper+0x19c/0x240
Jul 29 18:27:12 testnode1 kernel: async_run_entry_fn+0x96/0x4f0
Jul 29 18:27:12 testnode1 kernel: process_one_work+0x85a/0x1400
Jul 29 18:27:12 testnode1 kernel: worker_thread+0x5e2/0xfc0
Jul 29 18:27:12 testnode1 kernel: kthread+0x2d1/0x3a0
Jul 29 18:27:12 testnode1 kernel: ret_from_fork+0x30/0x70
Jul 29 18:27:12 testnode1 kernel: ret_from_fork_asm+0x1a/0x30
Jul 29 18:27:12 testnode1 kernel:
-> #0 (&q->limits_lock){+.+.}-{3:3}:
Jul 29 18:27:12 testnode1 kernel: __lock_acquire+0x2b90/0x5990
Jul 29 18:27:12 testnode1 kernel: lock_acquire+0x1b1/0x520
Jul 29 18:27:12 testnode1 kernel: __mutex_lock+0x18b/0x1220
Jul 29 18:27:12 testnode1 kernel: queue_max_sectors_store+0x12b/0x290
Jul 29 18:27:12 testnode1 kernel: queue_attr_store+0xb5/0x110
Jul 29 18:27:12 testnode1 kernel: kernfs_fop_write_iter+0x3a4/0x5a0
Jul 29 18:27:12 testnode1 kernel: vfs_write+0x5e3/0xe70
Jul 29 18:27:12 testnode1 kernel: ksys_write+0xf7/0x1d0
Jul 29 18:27:12 testnode1 kernel: do_syscall_64+0x93/0x180
Jul 29 18:27:12 testnode1 kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Jul 29 18:27:12 testnode1 kernel:
other info that might help us debug this:
Jul 29 18:27:12 testnode1 kernel: Chain exists of:
&q->limits_lock --> &q->sysfs_dir_lock --> &q->sysfs_lock
Jul 29 18:27:12 testnode1 kernel: Possible unsafe locking scenario:
Jul 29 18:27:12 testnode1 kernel: CPU0 CPU1
Jul 29 18:27:12 testnode1 kernel: ---- ----
Jul 29 18:27:12 testnode1 kernel: lock(&q->sysfs_lock);
Jul 29 18:27:12 testnode1 kernel: lock(&q->sysfs_dir_lock);
Jul 29 18:27:12 testnode1 kernel: lock(&q->sysfs_lock);
Jul 29 18:27:12 testnode1 kernel: lock(&q->limits_lock);
Jul 29 18:27:12 testnode1 kernel:
*** DEADLOCK ***
Jul 29 18:27:12 testnode1 kernel: 5 locks held by multipathd/118209:
Jul 29 18:27:12 testnode1 kernel: #0: ffff888135f35ec8 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x1d8/0x2d0
Jul 29 18:27:12 testnode1 kernel: #1: ffff88811f7a2420 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0xf7/0x1d0
Jul 29 18:27:12 testnode1 kernel: #2: ffff888105b57088 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x25f/0x5a0
Jul 29 18:27:12 testnode1 kernel: #3: ffff88817438af08 (kn->active#210){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x283/0x5a0
Jul 29 18:27:12 testnode1 kernel: #4: ffff888107094620 (&q->sysfs_lock){+.+.}-{3:3}, at: queue_attr_store+0x8b/0x110
Jul 29 18:27:12 testnode1 kernel:
stack backtrace:
Jul 29 18:27:12 testnode1 kernel: CPU: 1 UID: 0 PID: 118209 Comm: multipathd Not tainted 6.11.0-rc1 #208
Jul 29 18:27:12 testnode1 kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014
Jul 29 18:27:12 testnode1 kernel: Call Trace:
Jul 29 18:27:12 testnode1 kernel: <TASK>
Jul 29 18:27:12 testnode1 kernel: dump_stack_lvl+0x6a/0x90
Jul 29 18:27:12 testnode1 kernel: check_noncircular+0x306/0x3f0
Jul 29 18:27:12 testnode1 kernel: ? __pfx_check_noncircular+0x10/0x10
Jul 29 18:27:12 testnode1 kernel: ? __pfx___bfs+0x10/0x10
Jul 29 18:27:12 testnode1 kernel: ? lockdep_lock+0xca/0x1c0
Jul 29 18:27:12 testnode1 kernel: ? __pfx_lockdep_lock+0x10/0x10
Jul 29 18:27:12 testnode1 kernel: __lock_acquire+0x2b90/0x5990
Jul 29 18:27:12 testnode1 kernel: ? __pfx___lock_acquire+0x10/0x10
Jul 29 18:27:12 testnode1 kernel: ? __pfx_check_irq_usage+0x10/0x10
Jul 29 18:27:12 testnode1 kernel: ? __pfx___bfs+0x10/0x10
Jul 29 18:27:12 testnode1 kernel: lock_acquire+0x1b1/0x520
Jul 29 18:27:12 testnode1 kernel: ? queue_max_sectors_store+0x12b/0x290
Jul 29 18:27:12 testnode1 kernel: ? __pfx_lock_acquire+0x10/0x10
Jul 29 18:27:12 testnode1 kernel: ? lock_is_held_type+0xd5/0x130
Jul 29 18:27:12 testnode1 kernel: ? __pfx___might_resched+0x10/0x10
Jul 29 18:27:12 testnode1 kernel: __mutex_lock+0x18b/0x1220
Jul 29 18:27:12 testnode1 kernel: ? queue_max_sectors_store+0x12b/0x290
Jul 29 18:27:12 testnode1 kernel: ? queue_max_sectors_store+0x12b/0x290
Jul 29 18:27:12 testnode1 kernel: ? __pfx___mutex_lock+0x10/0x10
Jul 29 18:27:12 testnode1 kernel: ? queue_max_sectors_store+0x12b/0x290
Jul 29 18:27:12 testnode1 kernel: queue_max_sectors_store+0x12b/0x290
Jul 29 18:27:12 testnode1 kernel: ? __pfx_lock_acquire+0x10/0x10
Jul 29 18:27:12 testnode1 kernel: ? lock_is_held_type+0xd5/0x130
Jul 29 18:27:12 testnode1 kernel: ? rcu_is_watching+0x11/0xb0
Jul 29 18:27:12 testnode1 kernel: ? __pfx_queue_max_sectors_store+0x10/0x10
Jul 29 18:27:12 testnode1 kernel: ? __mutex_lock+0x433/0x1220
Jul 29 18:27:12 testnode1 kernel: ? queue_attr_store+0x8b/0x110
Jul 29 18:27:12 testnode1 kernel: ? __pfx_autoremove_wake_function+0x10/0x10
Jul 29 18:27:12 testnode1 kernel: ? _raw_spin_unlock_irqrestore+0x4c/0x60
Jul 29 18:27:12 testnode1 kernel: queue_attr_store+0xb5/0x110
Jul 29 18:27:12 testnode1 kernel: ? __pfx_sysfs_kf_write+0x10/0x10
Jul 29 18:27:12 testnode1 kernel: kernfs_fop_write_iter+0x3a4/0x5a0
Jul 29 18:27:12 testnode1 kernel: vfs_write+0x5e3/0xe70
Jul 29 18:27:12 testnode1 kernel: ? __pfx_vfs_write+0x10/0x10
Jul 29 18:27:12 testnode1 kernel: ksys_write+0xf7/0x1d0
Jul 29 18:27:12 testnode1 kernel: ? __pfx_ksys_write+0x10/0x10
Jul 29 18:27:12 testnode1 kernel: ? kasan_quarantine_put+0xe1/0x1f0
Jul 29 18:27:12 testnode1 kernel: do_syscall_64+0x93/0x180
Jul 29 18:27:12 testnode1 kernel: ? do_sys_openat2+0x125/0x180
Jul 29 18:27:12 testnode1 kernel: ? __pfx_do_sys_openat2+0x10/0x10
Jul 29 18:27:12 testnode1 kernel: ? lock_release+0x461/0x7b0
Jul 29 18:27:12 testnode1 kernel: ? __pfx_lock_release+0x10/0x10
Jul 29 18:27:12 testnode1 kernel: ? __x64_sys_openat+0x105/0x1d0
Jul 29 18:27:12 testnode1 kernel: ? __pfx___x64_sys_openat+0x10/0x10
Jul 29 18:27:12 testnode1 kernel: ? lockdep_hardirqs_on_prepare+0x16d/0x400
Jul 29 18:27:12 testnode1 kernel: ? do_syscall_64+0x9f/0x180
Jul 29 18:27:12 testnode1 kernel: ? lockdep_hardirqs_on+0x78/0x100
Jul 29 18:27:12 testnode1 kernel: ? do_syscall_64+0x9f/0x180
Jul 29 18:27:12 testnode1 kernel: ? lockdep_hardirqs_on_prepare+0x16d/0x400
Jul 29 18:27:12 testnode1 kernel: ? do_syscall_64+0x9f/0x180
Jul 29 18:27:12 testnode1 kernel: ? lockdep_hardirqs_on+0x78/0x100
Jul 29 18:27:12 testnode1 kernel: ? do_syscall_64+0x9f/0x180
Jul 29 18:27:12 testnode1 kernel: ? do_syscall_64+0x9f/0x180
Jul 29 18:27:12 testnode1 kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Jul 29 18:27:12 testnode1 kernel: RIP: 0033:0x7f6908e1989d
Jul 29 18:27:12 testnode1 kernel: Code: e5 48 83 ec 20 48 89 55 e8 48 89 75 f0 89 7d f8 e8 48 69 f8 ff 48 8b 55 e8 48 8b 75 f0 41 89 c0 8b 7d f8 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 45 f8 e8 9f 69 f8 ff 48 8b
Jul 29 18:27:12 testnode1 kernel: RSP: 002b:00007f6908914620 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
Jul 29 18:27:12 testnode1 kernel: RAX: ffffffffffffffda RBX: 00007f69089156ed RCX: 00007f6908e1989d
Jul 29 18:27:12 testnode1 kernel: RDX: 0000000000000003 RSI: 00007f69089156ed RDI: 000000000000000e
Jul 29 18:27:12 testnode1 kernel: RBP: 00007f6908914640 R08: 0000000000000000 R09: 00000000ffffffff
Jul 29 18:27:12 testnode1 kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000003
Jul 29 18:27:12 testnode1 kernel: R13: 00007f6908914660 R14: 000055da2ccc80c8 R15: 00007f68e001d3e0
Jul 29 18:27:12 testnode1 kernel: </TASK>