Re: linux-next regression: SNP Guest boot hangs with certain cpu/mem config combination

From: Jason Baron
Date: Thu Mar 27 2025 - 11:04:30 EST

Next message: Frank Li: "Re: [PATCH 5/7] ARM: dts: ls1021a-tqmals1021a: Add overlay for CDTech FC21 RGB display"
Previous message: Sergey Senozhatsky: "Re: [PATCHv2] thunderbolt: do not double dequeue a request"
In reply to: Kirill A. Shutemov: "Re: linux-next regression: SNP Guest boot hangs with certain cpu/mem config combination"
Next in thread: Kirill A. Shutemov: "Re: linux-next regression: SNP Guest boot hangs with certain cpu/mem config combination"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 3/27/25 10:25 AM, Kirill A. Shutemov wrote:

!-------------------------------------------------------------------|
This Message Is From an External Sender
This message came from outside your organization.
|-------------------------------------------------------------------!

On Thu, Mar 27, 2025 at 02:58:12PM +0200, Kirill A. Shutemov wrote:

On Wed, Mar 26, 2025 at 05:30:35PM -0500, Tom Lendacky wrote:

On 3/25/25 08:33, Kirill A. Shutemov wrote:

On Tue, Mar 25, 2025 at 02:40:00PM +0530, Aithal, Srikanth wrote:

Hello,

Starting linux-next build next-20250312, including recent build 20250324, we
are seeing an issue where the SNP guest boot hangs at the "boot smp config"
step:

[ 2.294722] smp: Bringing up secondary CPUs ...
[    2.295211] smpboot: Parallel CPU startup disabled by the platform
[    2.309687] smpboot: x86: Booting SMP configuration:
[    2.310214] .... node #0, CPUs:          #1   #2   #3   #4 #5   #6
#7   #8   #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21
#22 #23 #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35 #36
#37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47 #48 #49 #50 #51
#52 #53 #54 #55 #56 #57 #58 #59 #60 #61 #62 #63 #64 #65 #66
#67 #68 #69 #70 #71 #72 #73 #74 #75 #76 #77 #78 #79 #80 #81
#82 #83 #84 #85 #86 #87 #88 #89 #90 #91 #92 #93 #94 #95 #96
#97 #98 #99 #100 #101 #102 #103 #104 #105 #106 #107 #108 #109 #110 #111
#112 #113 #114 #115 #116 #117 #118 #119 #120 #121 #122 #123 #124 #125 #126
#127 #128 #129 #130 #131 #132 #133 #134 #135 #136 #137 #138 #139 #140 #141
#142 #143 #144 #145 #146 #147 #148 #149 #150 #151 #152 #153 #154 #155 #156
#157 #158 #159 #160 #161 #162 #163 #164 #165 #166 #167 #168 #169 #170 #171
#172 #173 #174 #175 #176 #177 #178 #179 #180 #181 #182 #183 #184 #185 #186
#187 #188 #189 #190 #191 #192 #193 #194 #195 #196 #197 #198
--> The guest hangs forever at this point.

I have observed that certain vCPU and memory combinations work, while others
do not. The VM configuration I am using does not have any NUMA nodes.

vcpus             Mem        SNP guest boot
<=240            19456M    Boots fine

=241,<255   19456M    Hangs

1-255             2048M    Boots fine
1-255            4096M    Boots fine

71                 8192M    Hangs
41                 6144M    Hangs

When I bisected this issue, it pointed to the following commit :

*commit 800f1059c99e2b39899bdc67a7593a7bea6375d8*
Author: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
Date:   Mon Mar 10 10:28:55 2025 +0200

    mm/page_alloc: fix memory accept before watermarks gets initialized

Hm. It is puzzling for me. I don't see how this commit can cause the hang.

Could you track down where hang happens?

Let me say that the guest config is key for this. Using that config, I
think you might be able to repro this on TDX. The config does turn off TDX
support, so I'm hoping that turning it on doesn't change anything.

I've been able to track it down slightly... It is happening during the CPU
bringup trace points and it eventually gets to line 2273 in
rb_allocate_cpu_buffer() and never comes back from an alloc_pages_node()
call. That's as far as I've gotten so far. I'm not a mm expert so not sure
if I'll be able to progress much further.

Urgh... It is deadlock on cpu_hotplug_lock :/

_cpu_up() takes the lock on write and starts CPU bring up under it.
If during CPU bringup we accept the last page in the zone, __accept_page()
calls static_branch_dec() which takes the lock again.

Oopsie.

So the patch itself doesn't introduce a regression, but uncovers
preexisting deadlock. With the patch we accept more pages during the boot
and it triggers the deadlock.

Let me think about the fix.

+ Static branch maintainers

The only option I see so far is to drop static branch from this path.

But I am not sure if it the only case were we use static branch from CPU
hotplug callbacks.

Any other ideas?

Hmmm, didn't take too close a look here, but there is the static_key_slow_dec_cpuslocked() variant, would that work here? Is the issue the caller may or may not have the cpu_hotplug lock?

Thanks,

-Jason

The deadlock I'm talking about:

============================================
WARNING: possible recursive locking detected
6.14.0-rc5+ #13 Tainted: G S
--------------------------------------------
swapper/0/1 is trying to acquire lock:
ffffffffbdc7e150 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_slow_dec (kernel/jump_label.c:321 kernel/jump_label.c:336)

but task is already holding lock:
ffffffffbdc7e150 (cpu_hotplug_lock){++++}-{0:0}, at: _cpu_up (./arch/x86/include/asm/bitops.h:227 ./arch/x86/include/asm/bitops.h:239 ./include/asm-generic/bitops/instrumented-non-atomic.h:142 ./include/linux/cpumask.h:570 ./include/linux/cpumask.h:1131 kernel/cpu.c:1641)

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(cpu_hotplug_lock);
lock(cpu_hotplug_lock);

*** DEADLOCK ***

May be due to missing lock nesting notation

2 locks held by swapper/0/1:
#0: ffffffffbdc7e058 (cpu_add_remove_lock){+.+.}-{4:4}, at: cpu_up (kernel/cpu.c:? kernel/cpu.c:1712)
#1: ffffffffbdc7e150 (cpu_hotplug_lock){++++}-{0:0}, at: _cpu_up (./arch/x86/include/asm/bitops.h:227 ./arch/x86/include/asm/bitops.h:239 ./include/asm-generic/bitops/instrumented-non-atomic.h:142 ./include/linux/cpumask.h:570 ./include/linux/cpumask.h:1131 kernel/cpu.c:1641)

stack backtrace:
CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Tainted: G S 6.14.0-rc5+ #13
Tainted: [S]=CPU_OUT_OF_SPEC
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:122)
print_deadlock_bug (kernel/locking/lockdep.c:3041)
__lock_acquire (kernel/locking/lockdep.c:? kernel/locking/lockdep.c:3893 kernel/locking/lockdep.c:5228)
? asm_sysvec_apic_timer_interrupt (./arch/x86/include/asm/idtentry.h:702)
? free_one_page (mm/page_alloc.c:?)
? static_key_slow_dec (kernel/jump_label.c:321 kernel/jump_label.c:336)
lock_acquire (kernel/locking/lockdep.c:5851)
? static_key_slow_dec (kernel/jump_label.c:321 kernel/jump_label.c:336)
cpus_read_lock (./include/linux/percpu-rwsem.h:51)
? static_key_slow_dec (kernel/jump_label.c:321 kernel/jump_label.c:336)
static_key_slow_dec (kernel/jump_label.c:321 kernel/jump_label.c:336)
cond_accept_memory (mm/page_alloc.c:7024)
get_page_from_freelist (./arch/x86/include/asm/bitops.h:206 ./arch/x86/include/asm/bitops.h:238 ./include/asm-generic/bitops/instrumented-non-atomic.h:142 mm/page_alloc.c:3417)
? lock_release (kernel/locking/lockdep.c:469)
__alloc_frozen_pages_noprof (mm/page_alloc.c:4740)
__alloc_pages_noprof (mm/page_alloc.c:4774)
rb_allocate_cpu_buffer (kernel/trace/ring_buffer.c:2235)
? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:97 ./arch/x86/include/asm/irqflags.h:155 ./include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194)
trace_rb_cpu_prepare (kernel/trace/ring_buffer.c:7322)
cpuhp_invoke_callback (kernel/cpu.c:216)
? __pfx_trace_rb_cpu_prepare (kernel/trace/ring_buffer.c:7297 kernel/trace/ring_buffer.c:7304)
_cpu_up (kernel/cpu.c:967 kernel/cpu.c:990 kernel/cpu.c:1021 kernel/cpu.c:1691)
cpu_up (kernel/cpu.c:473 kernel/cpu.c:1725)
cpuhp_bringup_mask (kernel/cpu.c:1789)
? kernel_init (init/main.c:1459)
smp_init (./include/linux/bitmap.h:445 ./include/linux/nodemask.h:241 ./include/linux/nodemask.h:438 kernel/smp.c:1012)
kernel_init_freeable (init/main.c:1561)
? __pfx_kernel_init (init/main.c:1455)
kernel_init (init/main.c:1459)
ret_from_fork (arch/x86/kernel/process.c:148)
? __pfx_kernel_init (init/main.c:1455)
ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
</TASK>

Next message: Frank Li: "Re: [PATCH 5/7] ARM: dts: ls1021a-tqmals1021a: Add overlay for CDTech FC21 RGB display"
Previous message: Sergey Senozhatsky: "Re: [PATCHv2] thunderbolt: do not double dequeue a request"
In reply to: Kirill A. Shutemov: "Re: linux-next regression: SNP Guest boot hangs with certain cpu/mem config combination"
Next in thread: Kirill A. Shutemov: "Re: linux-next regression: SNP Guest boot hangs with certain cpu/mem config combination"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]