Re: [PATCH] iommu/vt-d: Fix a IOMMU perfmon warning when CPU hotplug

From: Baolu Lu
Date: Sat Mar 25 2023 - 09:00:31 EST


On 2023/3/24 23:11, kan.liang@xxxxxxxxxxxxxxx wrote:
From: Kan Liang<kan.liang@xxxxxxxxxxxxxxx>

A warning can be triggered when hotplug CPU 0.
$ echo 0 > /sys/devices/system/cpu/cpu0/online

[11958.737635] ------------[ cut here ]------------
[11958.742882] Voluntary context switch within RCU read-side critical
section!
[11958.742891] WARNING: CPU: 0 PID: 19 at kernel/rcu/tree_plugin.h:318
rcu_note_context_switch+0x4f4/0x580
[11958.860095] RIP: 0010:rcu_note_context_switch+0x4f4/0x580
[11958.960360] Call Trace:
[11958.963161] <TASK>
[11958.965565] ? perf_event_update_userpage+0x104/0x150
[11958.971293] __schedule+0x8d/0x960
[11958.975165] ? perf_event_set_state.part.82+0x11/0x50
[11958.980891] schedule+0x44/0xb0
[11958.984464] schedule_timeout+0x226/0x310
[11958.989017] ? __perf_event_disable+0x64/0x1a0
[11958.994054] ? _raw_spin_unlock+0x14/0x30
[11958.998605] wait_for_completion+0x94/0x130
[11959.003352] __wait_rcu_gp+0x108/0x130
[11959.007616] synchronize_rcu+0x67/0x70
[11959.011876] ? invoke_rcu_core+0xb0/0xb0
[11959.016333] ? __bpf_trace_rcu_stall_warning+0x10/0x10
[11959.022147] perf_pmu_migrate_context+0x121/0x370
[11959.027478] iommu_pmu_cpu_offline+0x6a/0xa0
[11959.032325] ? iommu_pmu_del+0x1e0/0x1e0
[11959.036782] cpuhp_invoke_callback+0x129/0x510
[11959.041825] cpuhp_thread_fun+0x94/0x150
[11959.046283] smpboot_thread_fn+0x183/0x220
[11959.050933] ? sort_range+0x20/0x20
[11959.054902] kthread+0xe6/0x110
[11959.058479] ? kthread_complete_and_exit+0x20/0x20
[11959.063911] ret_from_fork+0x1f/0x30
[11959.067982] </TASK>
[11959.070489] ---[ end trace 0000000000000000 ]---

The synchronize_rcu() will be invoked in the perf_pmu_migrate_context(),
when migrating a PMU to a new CPU. However, the current for_each_iommu()
is within RCU read-side critical section.

Use the dmar_global_lock to replace the RCU read lock when going through
the drhd list.

This introduces the following lockdep warning.

The iommu_pmu_register() path holds the dmar_global_lock and applies for
cpuhp_state lock; On the contrary, the iommu_pmu_cpu_offline() path
holds the cpuhp_state and applies for the dmar_global_lock.

Can you please have a look into it?

[12861.883945] ======================================================
[12861.884750] WARNING: possible circular locking dependency detected
[12861.885556] 6.3.0-rc3-00033-g5ff8930d087a #491 Not tainted
[12861.886272] ------------------------------------------------------
[12861.887077] cpuhp/0/19 is trying to acquire lock:
[12861.887703] ffffffff91c1d0d0 (dmar_global_lock){++++}-{3:3}, at: iommu_pmu_cpu_offline+0x62/0xd0
[12861.888839]
but task is already holding lock:
[12861.889605] ffffffff8e26ed40 (cpuhp_state-down){+.+.}-{0:0}, at: cpuhp_thread_fun+0x4b/0x210
[12861.890701]
which lock already depends on the new lock.

[12861.891747]
the existing dependency chain (in reverse order) is:
[12861.892713]
-> #2 (cpuhp_state-down){+.+.}-{0:0}:
[12861.893525] lock_acquire+0xc2/0x2a0
[12861.894074] cpuhp_kick_ap_work+0x5b/0x1d0
[12861.894692] _cpu_down+0xc3/0x300
[12861.895210] cpu_down+0x2e/0x50
[12861.895708] device_offline+0x8c/0xc0
[12861.896266] online_store+0x4e/0xa0
[12861.896804] kernfs_fop_write_iter+0x12a/0x1d0
[12861.897462] vfs_write+0x313/0x4c0
[12861.897989] ksys_write+0x60/0xe0
[12861.898507] do_syscall_64+0x43/0x90
[12861.899055] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[12861.899783]
-> #1 (cpu_hotplug_lock){++++}-{0:0}:
[12861.900595] lock_acquire+0xc2/0x2a0
[12861.901143] __cpuhp_setup_state+0x5b/0x1f0
[12861.901772] iommu_pmu_register+0x1a2/0x200
[12861.902400] intel_iommu_init+0x5de/0x880
[12861.903009] pci_iommu_init+0x12/0x40
[12861.903567] do_one_initcall+0x65/0x290
[12861.904155] kernel_init_freeable+0x3cf/0x560
[12861.904804] kernel_init+0x1a/0x140
[12861.905342] ret_from_fork+0x29/0x50
[12861.905890]
-> #0 (dmar_global_lock){++++}-{3:3}:
[12861.906702] check_prevs_add+0x160/0xee0
[12861.907300] __lock_acquire+0x143b/0x18c0
[12861.907909] lock_acquire+0xc2/0x2a0
[12861.908457] down_write+0x3f/0xc0
[12861.908976] iommu_pmu_cpu_offline+0x62/0xd0
[12861.909614] cpuhp_invoke_callback+0x175/0x670
[12861.910273] cpuhp_thread_fun+0x19d/0x210
[12861.910882] smpboot_thread_fn+0x11d/0x240
[12861.911501] kthread+0xeb/0x120
[12861.911998] ret_from_fork+0x29/0x50
[12861.912546]
other info that might help us debug this:

[12861.913572] Chain exists of:
dmar_global_lock --> cpu_hotplug_lock --> cpuhp_state-down

[12861.914977] Possible unsafe locking scenario:

[12861.915753] CPU0 CPU1
[12861.916358] ---- ----
[12861.916963] lock(cpuhp_state-down);
[12861.917460] lock(cpu_hotplug_lock);
[12861.918269] lock(cpuhp_state-down);
[12861.919077] lock(dmar_global_lock);
[12861.919575]
*** DEADLOCK ***

[12861.920350] 2 locks held by cpuhp/0/19:
[12861.920865] #0: ffffffff8e26ebd0 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x47/0x210
[12861.922014] #1: ffffffff8e26ed40 (cpuhp_state-down){+.+.}-{0:0}, at: cpuhp_thread_fun+0x4b/0x210
[12861.923162]
stack backtrace:
[12861.923747] CPU: 0 PID: 19 Comm: cpuhp/0 Not tainted 6.3.0-rc3-00033-g5ff8930d087a #491
[12861.924775] Hardware name: Intel Corporation BIRCHSTREAM/BIRCHSTREAM, BIOS BHSDREL1.86B.0018.D47.2211031932 11/03/2022
[12861.926133] Call Trace:
[12861.926478] <TASK>
[12861.926783] dump_stack_lvl+0x4a/0x80
[12861.927282] check_noncircular+0x102/0x120
[12861.927833] ? alloc_list_entry+0x13/0x130
[12861.928384] check_prevs_add+0x160/0xee0
[12861.928918] __lock_acquire+0x143b/0x18c0
[12861.929461] lock_acquire+0xc2/0x2a0
[12861.929950] ? iommu_pmu_cpu_offline+0x62/0xd0
[12861.930550] ? lock_is_held_type+0x9d/0x110
[12861.931113] down_write+0x3f/0xc0
[12861.931572] ? iommu_pmu_cpu_offline+0x62/0xd0
[12861.932171] iommu_pmu_cpu_offline+0x62/0xd0
[12861.932749] ? __pfx_iommu_pmu_cpu_offline+0x10/0x10
[12861.933408] cpuhp_invoke_callback+0x175/0x670
[12861.934007] ? lock_release+0xc3/0x250
[12861.934519] cpuhp_thread_fun+0x19d/0x210
[12861.935059] ? smpboot_thread_fn+0x2f/0x240
[12861.935618] ? smpboot_thread_fn+0x69/0x240
[12861.936177] ? smpboot_thread_fn+0x125/0x240
[12861.936757] smpboot_thread_fn+0x11d/0x240
[12861.937308] ? __pfx_smpboot_thread_fn+0x10/0x10
[12861.937927] kthread+0xeb/0x120
[12861.938365] ? __pfx_kthread+0x10/0x10
[12861.938874] ret_from_fork+0x29/0x50
[12861.939370] </TASK>

Best regards,
baolu