!-------------------------------------------------------------------|
This Message Is From an External Sender
This message came from outside your organization.
|-------------------------------------------------------------------!
On Thu, Mar 27, 2025 at 02:58:12PM +0200, Kirill A. Shutemov wrote:
On Wed, Mar 26, 2025 at 05:30:35PM -0500, Tom Lendacky wrote:
On 3/25/25 08:33, Kirill A. Shutemov wrote:
On Tue, Mar 25, 2025 at 02:40:00PM +0530, Aithal, Srikanth wrote:
Hello,
Starting linux-next build next-20250312, including recent build 20250324, we
are seeing an issue where the SNP guest boot hangs at the "boot smp config"
step:
[ 2.294722] smp: Bringing up secondary CPUs ...
[ 2.295211] smpboot: Parallel CPU startup disabled by the platform
[ 2.309687] smpboot: x86: Booting SMP configuration:
[ 2.310214] .... node #0, CPUs: #1 #2 #3 #4 #5 #6
#7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21
#22 #23 #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35 #36
#37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47 #48 #49 #50 #51
#52 #53 #54 #55 #56 #57 #58 #59 #60 #61 #62 #63 #64 #65 #66
#67 #68 #69 #70 #71 #72 #73 #74 #75 #76 #77 #78 #79 #80 #81
#82 #83 #84 #85 #86 #87 #88 #89 #90 #91 #92 #93 #94 #95 #96
#97 #98 #99 #100 #101 #102 #103 #104 #105 #106 #107 #108 #109 #110 #111
#112 #113 #114 #115 #116 #117 #118 #119 #120 #121 #122 #123 #124 #125 #126
#127 #128 #129 #130 #131 #132 #133 #134 #135 #136 #137 #138 #139 #140 #141
#142 #143 #144 #145 #146 #147 #148 #149 #150 #151 #152 #153 #154 #155 #156
#157 #158 #159 #160 #161 #162 #163 #164 #165 #166 #167 #168 #169 #170 #171
#172 #173 #174 #175 #176 #177 #178 #179 #180 #181 #182 #183 #184 #185 #186
#187 #188 #189 #190 #191 #192 #193 #194 #195 #196 #197 #198
--> The guest hangs forever at this point.
I have observed that certain vCPU and memory combinations work, while others
do not. The VM configuration I am using does not have any NUMA nodes.
vcpus Mem SNP guest boot
<=240 19456M Boots fine
=241,<255 19456M Hangs1-255 2048M Boots fine
1-255 4096M Boots fine
71 8192M Hangs
41 6144M Hangs
When I bisected this issue, it pointed to the following commit :
*commit 800f1059c99e2b39899bdc67a7593a7bea6375d8*
Author: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
Date: Mon Mar 10 10:28:55 2025 +0200
mm/page_alloc: fix memory accept before watermarks gets initialized
Hm. It is puzzling for me. I don't see how this commit can cause the hang.
Could you track down where hang happens?
Let me say that the guest config is key for this. Using that config, I
think you might be able to repro this on TDX. The config does turn off TDX
support, so I'm hoping that turning it on doesn't change anything.
I've been able to track it down slightly... It is happening during the CPU
bringup trace points and it eventually gets to line 2273 in
rb_allocate_cpu_buffer() and never comes back from an alloc_pages_node()
call. That's as far as I've gotten so far. I'm not a mm expert so not sure
if I'll be able to progress much further.
Urgh... It is deadlock on cpu_hotplug_lock :/
_cpu_up() takes the lock on write and starts CPU bring up under it.
If during CPU bringup we accept the last page in the zone, __accept_page()
calls static_branch_dec() which takes the lock again.
Oopsie.
So the patch itself doesn't introduce a regression, but uncovers
preexisting deadlock. With the patch we accept more pages during the boot
and it triggers the deadlock.
Let me think about the fix.
+ Static branch maintainers
The only option I see so far is to drop static branch from this path.
But I am not sure if it the only case were we use static branch from CPU
hotplug callbacks.
Any other ideas?
The deadlock I'm talking about:
============================================
WARNING: possible recursive locking detected
6.14.0-rc5+ #13 Tainted: G S
--------------------------------------------
swapper/0/1 is trying to acquire lock:
ffffffffbdc7e150 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_slow_dec (kernel/jump_label.c:321 kernel/jump_label.c:336)
but task is already holding lock:
ffffffffbdc7e150 (cpu_hotplug_lock){++++}-{0:0}, at: _cpu_up (./arch/x86/include/asm/bitops.h:227 ./arch/x86/include/asm/bitops.h:239 ./include/asm-generic/bitops/instrumented-non-atomic.h:142 ./include/linux/cpumask.h:570 ./include/linux/cpumask.h:1131 kernel/cpu.c:1641)
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(cpu_hotplug_lock);
lock(cpu_hotplug_lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by swapper/0/1:
#0: ffffffffbdc7e058 (cpu_add_remove_lock){+.+.}-{4:4}, at: cpu_up (kernel/cpu.c:? kernel/cpu.c:1712)
#1: ffffffffbdc7e150 (cpu_hotplug_lock){++++}-{0:0}, at: _cpu_up (./arch/x86/include/asm/bitops.h:227 ./arch/x86/include/asm/bitops.h:239 ./include/asm-generic/bitops/instrumented-non-atomic.h:142 ./include/linux/cpumask.h:570 ./include/linux/cpumask.h:1131 kernel/cpu.c:1641)
stack backtrace:
CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Tainted: G S 6.14.0-rc5+ #13
Tainted: [S]=CPU_OUT_OF_SPEC
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:122)
print_deadlock_bug (kernel/locking/lockdep.c:3041)
__lock_acquire (kernel/locking/lockdep.c:? kernel/locking/lockdep.c:3893 kernel/locking/lockdep.c:5228)
? asm_sysvec_apic_timer_interrupt (./arch/x86/include/asm/idtentry.h:702)
? free_one_page (mm/page_alloc.c:?)
? static_key_slow_dec (kernel/jump_label.c:321 kernel/jump_label.c:336)
lock_acquire (kernel/locking/lockdep.c:5851)
? static_key_slow_dec (kernel/jump_label.c:321 kernel/jump_label.c:336)
cpus_read_lock (./include/linux/percpu-rwsem.h:51)
? static_key_slow_dec (kernel/jump_label.c:321 kernel/jump_label.c:336)
static_key_slow_dec (kernel/jump_label.c:321 kernel/jump_label.c:336)
cond_accept_memory (mm/page_alloc.c:7024)
get_page_from_freelist (./arch/x86/include/asm/bitops.h:206 ./arch/x86/include/asm/bitops.h:238 ./include/asm-generic/bitops/instrumented-non-atomic.h:142 mm/page_alloc.c:3417)
? lock_release (kernel/locking/lockdep.c:469)
__alloc_frozen_pages_noprof (mm/page_alloc.c:4740)
__alloc_pages_noprof (mm/page_alloc.c:4774)
rb_allocate_cpu_buffer (kernel/trace/ring_buffer.c:2235)
? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:97 ./arch/x86/include/asm/irqflags.h:155 ./include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194)
trace_rb_cpu_prepare (kernel/trace/ring_buffer.c:7322)
cpuhp_invoke_callback (kernel/cpu.c:216)
? __pfx_trace_rb_cpu_prepare (kernel/trace/ring_buffer.c:7297 kernel/trace/ring_buffer.c:7304)
_cpu_up (kernel/cpu.c:967 kernel/cpu.c:990 kernel/cpu.c:1021 kernel/cpu.c:1691)
cpu_up (kernel/cpu.c:473 kernel/cpu.c:1725)
cpuhp_bringup_mask (kernel/cpu.c:1789)
? kernel_init (init/main.c:1459)
smp_init (./include/linux/bitmap.h:445 ./include/linux/nodemask.h:241 ./include/linux/nodemask.h:438 kernel/smp.c:1012)
kernel_init_freeable (init/main.c:1561)
? __pfx_kernel_init (init/main.c:1455)
kernel_init (init/main.c:1459)
ret_from_fork (arch/x86/kernel/process.c:148)
? __pfx_kernel_init (init/main.c:1455)
ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
</TASK>