Re: resctrl mount fail on v6.13-rc1

From: Reinette Chatre
Date: Tue Dec 03 2024 - 00:02:59 EST




On 12/2/24 8:54 PM, Reinette Chatre wrote:
>
>
> On 12/2/24 6:47 PM, Luck, Tony wrote:
>> On Mon, Dec 02, 2024 at 02:26:48PM -0800, Reinette Chatre wrote:
>>> Hi Tony,
>>>
>>> On 12/2/24 1:42 PM, Luck, Tony wrote:
>>>> Anyone better a decoding lockdep dumps then me make sense of this?
>>>>
>>>> All I did was build v6.13-rc1 with (among others)
>>>>
>>>> CONFIG_PROVE_LOCKING=y
>>>> CONFIG_PROVE_RAW_LOCK_NESTING=y
>>>> CONFIG_PROVE_RCU=y
>>>>
>>>> and then mount the resctrl filesystem:
>>>>
>>>> $ sudo mount -t resctrl resctrl /sys/fs/resctrl
>>>>
>>>> There are only trivial changes to the resctrl code between
>>>> v6.12 (which works) and v6.13-rc1:
>>>>
>>>> $ git log --oneline v6.13-rc1 ^v6.12 -- arch/x86/kernel/cpu/resctrl
>>>> 5a4b3fbb4849 Merge tag 'x86_cache_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>>>> 9bce6e94c4b3 x86/resctrl: Support Sub-NUMA cluster mode SNC6
>>>> 29eaa7958367 x86/resctrl: Slightly clean-up mbm_config_show()
>>>>
>>>> So something in kernfs? Or the way resctrl uses kernfs?
>>>
>>> I am not seeing this but that may be because I am not testing with
>>> selinux enabled. My test kernel has:
>>> # CONFIG_SECURITY_SELINUX is not set
>>>
>>> I am also not running with any btrfs filesystems.
>>>
>>> Is this your usual setup in which you are seeing this the first time? Is it
>>> perhaps possible for you to bisect?
>>
>> Bisection says:
>>
>> $ git bisect bad
>> f1be1788a32e8fa63416ad4518bbd1a85a825c9d is the first bad commit
>> commit f1be1788a32e8fa63416ad4518bbd1a85a825c9d
>> Author: Ming Lei <ming.lei@xxxxxxxxxx>
>> Date: Fri Oct 25 08:37:20 2024 +0800
>>
>> block: model freeze & enter queue as lock for supporting lockdep
>>
>
> Thank you very much Tony. Since you did not respond to the question about
> bisect I assumed that you would not do it. I ended up duplicating the bisect
> effort after getting an environment in which I can reproduce the issue. Doing so
> I am able to confirm the commit pointed to by bisect.
> The commit cannot be reverted cleanly so I could not test v6.13-rc1 with it
> reverted.
>
> Ming Lei: I'd be happy to help with testing if you do not have hardware with
> which you can reproduce the issue.

One datapoint that I neglected to mention: btrfs does not seem to be required. The system
I tested on used ext4 filesystem resulting in trace below:

[ 67.598375] ======================================================
[ 67.604562] WARNING: possible circular locking dependency detected
[ 67.610741] 6.12.0-rc4-00037-gf1be1788a32e #1 Not tainted
[ 67.616146] ------------------------------------------------------
[ 67.622326] mount/2738 is trying to acquire lock:
[ 67.627031] ff2f341adb984578 (&sbsec->lock){+.+.}-{4:4}, at: selinux_set_mnt_opts+0x71/0x730
[ 67.635481]
but task is already holding lock:
[ 67.641312] ff2f341ad746a0e0 (&type->s_umount_key#65/1){+.+.}-{4:4}, at: alloc_super+0xd9/0x3d0
[ 67.650013]
which lock already depends on the new lock.

[ 67.658183]
the existing dependency chain (in reverse order) is:
[ 67.665654]
-> #5 (&type->s_umount_key#65/1){+.+.}-{4:4}:
[ 67.672525] reacquire_held_locks+0xce/0x1f0
[ 67.677317] lock_release+0x11e/0x2b0
[ 67.681503] __mutex_unlock_slowpath+0x3b/0x290
[ 67.686558] rdt_get_tree+0x1b9/0x5c0
[ 67.690750] vfs_get_tree+0x29/0xf0
[ 67.694762] vfs_cmd_create+0x59/0xe0
[ 67.698948] __do_sys_fsconfig+0x4f3/0x6c0
[ 67.703567] do_syscall_64+0xc5/0x210
[ 67.707763] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 67.713341]
-> #4 (cpu_hotplug_lock){++++}-{0:0}:
[ 67.719523] lock_acquire.part.0+0x69/0x1b0
[ 67.724226] __cpuhp_state_add_instance+0x3b/0x1c0
[ 67.729546] blk_mq_alloc_and_init_hctx+0x44d/0x4c0
[ 67.734945] blk_mq_realloc_hw_ctxs+0x15e/0x190
[ 67.739998] blk_mq_init_allocated_queue+0x138/0x480
[ 67.745485] blk_mq_alloc_queue+0x7b/0xe0
[ 67.750015] scsi_alloc_sdev+0x281/0x3c0
[ 67.754463] scsi_probe_and_add_lun+0x1f5/0x450
[ 67.759514] __scsi_add_device+0x10f/0x130
[ 67.764133] ata_scsi_scan_host+0x9c/0x1b0
[ 67.768753] async_run_entry_fn+0x31/0x130
[ 67.773378] process_one_work+0x204/0x590
[ 67.777912] worker_thread+0x180/0x340
[ 67.782182] kthread+0xd0/0x100
[ 67.785848] ret_from_fork+0x31/0x50
[ 67.789947] ret_from_fork_asm+0x1a/0x30
[ 67.794394]
-> #3 (&q->sysfs_lock){+.+.}-{4:4}:
[ 67.800399] lock_acquire.part.0+0x69/0x1b0
[ 67.805103] __mutex_lock+0xaf/0x740
[ 67.809205] queue_attr_store+0x61/0xb0
[ 67.813573] kernfs_fop_write_iter+0x13a/0x200
[ 67.818546] vfs_write+0x29c/0x540
[ 67.822470] ksys_write+0x73/0xf0
[ 67.826311] do_syscall_64+0xc5/0x210
[ 67.830496] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 67.836069]
-> #2 (&q->q_usage_counter(io)){++++}-{0:0}:
[ 67.842852] lock_acquire.part.0+0x69/0x1b0
[ 67.847557] blk_mq_submit_bio+0x81b/0xab0
[ 67.852179] __submit_bio+0x97/0x180
[ 67.856277] submit_bio_noacct_nocheck+0x324/0x430
[ 67.861589] ext4_read_bh+0x51/0x90
[ 67.865601] ext4_sb_bread+0x75/0x90
[ 67.869699] ext4_xattr_get+0xf5/0x250
[ 67.873973] __vfs_getxattr+0x7f/0xb0
[ 67.878159] inode_doinit_use_xattr+0x63/0x170
[ 67.883124] inode_doinit_with_dentry+0x36b/0x530
[ 67.888348] security_d_instantiate+0x39/0x50
[ 67.893228] d_splice_alias+0x52/0x4e0
[ 67.897499] ext4_lookup+0x1e1/0x270
[ 67.901599] lookup_one_qstr_excl+0x6f/0xa0
[ 67.906306] filename_create+0xcb/0x1b0
[ 67.910665] do_mkdirat+0x5c/0x140
[ 67.914589] __x64_sys_mkdir+0x46/0x70
[ 67.918863] do_syscall_64+0xc5/0x210
[ 67.923056] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 67.928629]
-> #1 (&ei->xattr_sem){++++}-{4:4}:
[ 67.934633] lock_acquire.part.0+0x69/0x1b0
[ 67.939340] down_read+0x31/0x150
[ 67.943178] ext4_xattr_get+0x71/0x250
[ 67.947448] __vfs_getxattr+0x7f/0xb0
[ 67.951634] sb_finish_set_opts+0x1ab/0x290
[ 67.956341] selinux_set_mnt_opts+0x4d9/0x730
[ 67.961220] iterate_supers+0x7b/0xf0
[ 67.965405] selinux_policy_commit+0x24d/0x2b0
[ 67.970373] sel_write_load+0x4ee/0xbc0
[ 67.974739] vfs_write+0xe5/0x540
[ 67.978577] ksys_write+0x73/0xf0
[ 67.982415] do_syscall_64+0xc5/0x210
[ 67.986604] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 67.992175]
-> #0 (&sbsec->lock){+.+.}-{4:4}:
[ 67.998006] check_prev_add+0xeb/0xd80
[ 68.002280] __lock_acquire+0x11b6/0x1640
[ 68.006811] lock_acquire.part.0+0x69/0x1b0
[ 68.011515] __mutex_lock+0xaf/0x740
[ 68.015615] selinux_set_mnt_opts+0x71/0x730
[ 68.020407] security_sb_set_mnt_opts+0x54/0x90
[ 68.025460] vfs_get_tree+0x81/0xf0
[ 68.029472] vfs_cmd_create+0x59/0xe0
[ 68.033656] __do_sys_fsconfig+0x4f3/0x6c0
[ 68.038276] do_syscall_64+0xc5/0x210
[ 68.042461] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 68.048036]
other info that might help us debug this:

[ 68.056033] Chain exists of:
&sbsec->lock --> cpu_hotplug_lock --> &type->s_umount_key#65/1

[ 68.067419] Possible unsafe locking scenario:

[ 68.073331] CPU0 CPU1
[ 68.077863] ---- ----
[ 68.082393] lock(&type->s_umount_key#65/1);
[ 68.086752] lock(cpu_hotplug_lock);
[ 68.092931] lock(&type->s_umount_key#65/1);
[ 68.099806] lock(&sbsec->lock);
[ 68.103122]
*** DEADLOCK ***

[ 68.109033] 2 locks held by mount/2738:
[ 68.112873] #0: ff2f341adbb5b070 (&fc->uapi_mutex){+.+.}-{4:4}, at: __do_sys_fsconfig+0x4bf/0x6c0
[ 68.121831] #1: ff2f341ad746a0e0 (&type->s_umount_key#65/1){+.+.}-{4:4}, at: alloc_super+0xd9/0x3d0
[ 68.130966]
stack backtrace:
[ 68.135327] CPU: 109 UID: 0 PID: 2738 Comm: mount Not tainted 6.12.0-rc4-00037-gf1be1788a32e #1
[ 68.144017] Hardware name: Intel Corporation WilsonCity/WilsonCity, BIOS WLYDCRB1.SYS.0027.P82.2204080829 04/08/2022
[ 68.154528] Call Trace:
[ 68.156973] <TASK>
[ 68.159078] dump_stack_lvl+0x5d/0x80
[ 68.162744] print_circular_bug.cold+0x178/0x1be
[ 68.167364] check_noncircular+0x14e/0x170
[ 68.171466] check_prev_add+0xeb/0xd80
[ 68.175223] __lock_acquire+0x11b6/0x1640
[ 68.179238] lock_acquire.part.0+0x69/0x1b0
[ 68.183423] ? selinux_set_mnt_opts+0x71/0x730
[ 68.187866] ? selinux_set_mnt_opts+0x71/0x730
[ 68.192313] __mutex_lock+0xaf/0x740
[ 68.195892] ? selinux_set_mnt_opts+0x71/0x730
[ 68.200338] ? selinux_set_mnt_opts+0x71/0x730
[ 68.204782] ? selinux_set_mnt_opts+0x71/0x730
[ 68.209228] ? lock_is_held_type+0x85/0xf0
[ 68.213328] selinux_set_mnt_opts+0x71/0x730
[ 68.217599] security_sb_set_mnt_opts+0x54/0x90
[ 68.222134] vfs_get_tree+0x81/0xf0
[ 68.225623] ? capable+0x3a/0x60
[ 68.228855] vfs_cmd_create+0x59/0xe0
[ 68.232524] __do_sys_fsconfig+0x4f3/0x6c0
[ 68.236623] do_syscall_64+0xc5/0x210
[ 68.240296] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 68.245348] RIP: 0033:0x7f1fcd171d5e
[ 68.248928] Code: 73 01 c3 48 8b 0d b2 40 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 49 89 ca b8 af 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 82 40 0f 00 f7 d8 64 89 01 48
[ 68.267672] RSP: 002b:00007ffde8bb54b8 EFLAGS: 00000246 ORIG_RAX: 00000000000001af
[ 68.275245] RAX: ffffffffffffffda RBX: 0000557854ef36c0 RCX: 00007f1fcd171d5e
[ 68.282377] RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000003
[ 68.289507] RBP: 00007ffde8bb5600 R08: 0000000000000000 R09: 0000000000000001
[ 68.296641] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f1fcd2ecb00
[ 68.303770] R13: 0000000000000000 R14: 0000557854ef47a0 R15: 00007f1fcd2e1561
[ 68.310899] </TASK>