[Regression] 6.11.0-rc1: BUG: using smp_processor_id() in preemptible when suspend the system

From: David Wang
Date: Tue Jul 30 2024 - 10:26:49 EST


Hi,

When I suspend my system, via `systemctl suspend`, kernel BUG shows up in log:

kernel: [ 1734.412974] smpboot: CPU 2 is now offline
kernel: [ 1734.414952] BUG: using smp_processor_id() in preemptible [00000000] code: systemd-sleep/4619
kernel: [ 1734.414957] caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0
kernel: [ 1734.414964] CPU: 0 UID: 0 PID: 4619 Comm: systemd-sleep Tainted: P OE 6.11.0-rc1-linan-4 #292
kernel: [ 1734.414968] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
kernel: [ 1734.414969] Hardware name: Micro-Star International Co., Ltd. MS-7B89/B450M MORTAR MAX (MS-7B89), BIOS 2.80 06/10/2020
kernel: [ 1734.414970] Call Trace:
kernel: [ 1734.414974] <TASK>
kernel: [ 1734.414978] dump_stack_lvl+0x60/0x80
kernel: [ 1734.414982] check_preemption_disabled+0xce/0xe0
kernel: [ 1734.414987] hotplug_cpu__broadcast_tick_pull+0x1c/0xc0
kernel: [ 1734.414992] ? __pfx_takedown_cpu+0x10/0x10
kernel: [ 1734.414996] takedown_cpu+0x97/0x130
kernel: [ 1734.414999] cpuhp_invoke_callback+0xf8/0x450
kernel: [ 1734.415004] __cpuhp_invoke_callback_range+0x78/0xe0
kernel: [ 1734.415008] _cpu_down+0xf4/0x360
kernel: [ 1734.415012] freeze_secondary_cpus+0xae/0x290
kernel: [ 1734.415016] suspend_devices_and_enter+0x1da/0x920
kernel: [ 1734.415022] pm_suspend+0x1fa/0x500
kernel: [ 1734.415025] state_store+0x68/0xd0
kernel: [ 1734.415028] kernfs_fop_write_iter+0x169/0x1f0
kernel: [ 1734.415034] vfs_write+0x269/0x440
kernel: [ 1734.415041] ksys_write+0x63/0xe0
kernel: [ 1734.415044] do_syscall_64+0x4b/0x110
kernel: [ 1734.415048] entry_SYSCALL_64_after_hwframe+0x76/0x7e
kernel: [ 1734.415052] RIP: 0033:0x7fe885cee240
kernel: [ 1734.415055] Code: 40 00 48 8b 15 c1 9b 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 80 3d a1 23 0e 00 00 74 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89
kernel: [ 1734.415057] RSP: 002b:00007ffc53ccec58 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
kernel: [ 1734.415060] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fe885cee240
kernel: [ 1734.415062] RDX: 0000000000000004 RSI: 00007ffc53cced40 RDI: 0000000000000004
kernel: [ 1734.415063] RBP: 00007ffc53cced40 R08: 0000000000000007 R09: 000055f34dde8210
kernel: [ 1734.415064] R10: 6bccc22257390b18 R11: 0000000000000202 R12: 0000000000000004
kernel: [ 1734.415066] R13: 000055f34dde42d0 R14: 0000000000000004 R15: 00007fe885dc49e0
kernel: [ 1734.415071] </TASK>


I confirmed that this was introduced by commit:
f7d43dd206e7e18c182f200e67a8db8c209907fa tick/broadcast: Make takeover of broadcast hrtimer reliable
, and revert this commit can fix it.


Thanks
David