Re: linux-next regression: SNP Guest boot hangs with certain cpu/mem config combination

From: Kirill A. Shutemov
Date: Thu Mar 27 2025 - 10:45:18 EST


On Thu, Mar 27, 2025 at 10:35:33AM -0400, Jason Baron wrote:
>
>
> On 3/27/25 10:25 AM, Kirill A. Shutemov wrote:
> > !-------------------------------------------------------------------|
> > This Message Is From an External Sender
> > This message came from outside your organization.
> > |-------------------------------------------------------------------!
> >
> > On Thu, Mar 27, 2025 at 02:58:12PM +0200, Kirill A. Shutemov wrote:
> > > On Wed, Mar 26, 2025 at 05:30:35PM -0500, Tom Lendacky wrote:
> > > > On 3/25/25 08:33, Kirill A. Shutemov wrote:
> > > > > On Tue, Mar 25, 2025 at 02:40:00PM +0530, Aithal, Srikanth wrote:
> > > > > > Hello,
> > > > > >
> > > > > >
> > > > > > Starting linux-next build next-20250312, including recent build 20250324, we
> > > > > > are seeing an issue where the SNP guest boot hangs at the "boot smp config"
> > > > > > step:
> > > > > >
> > > > > >
> > > > > >  [ 2.294722] smp: Bringing up secondary CPUs ...
> > > > > > [    2.295211] smpboot: Parallel CPU startup disabled by the platform
> > > > > > [    2.309687] smpboot: x86: Booting SMP configuration:
> > > > > > [    2.310214] .... node  #0, CPUs:          #1   #2   #3   #4 #5   #6
> > > > > > #7   #8   #9  #10  #11  #12  #13  #14  #15  #16  #17 #18  #19  #20  #21
> > > > > > #22  #23  #24  #25  #26  #27  #28  #29  #30 #31  #32  #33  #34  #35  #36
> > > > > > #37  #38  #39  #40  #41  #42  #43 #44  #45  #46  #47  #48  #49  #50  #51
> > > > > > #52  #53  #54  #55  #56 #57  #58  #59  #60  #61  #62  #63  #64  #65  #66
> > > > > > #67  #68  #69 #70  #71  #72  #73  #74  #75  #76  #77  #78  #79  #80  #81
> > > > > > #82 #83  #84  #85  #86  #87  #88  #89  #90  #91  #92  #93  #94  #95 #96
> > > > > > #97  #98  #99 #100 #101 #102 #103 #104 #105 #106 #107 #108 #109 #110 #111
> > > > > > #112 #113 #114 #115 #116 #117 #118 #119 #120 #121 #122 #123 #124 #125 #126
> > > > > > #127 #128 #129 #130 #131 #132 #133 #134 #135 #136 #137 #138 #139 #140 #141
> > > > > > #142 #143 #144 #145 #146 #147 #148 #149 #150 #151 #152 #153 #154 #155 #156
> > > > > > #157 #158 #159 #160 #161 #162 #163 #164 #165 #166 #167 #168 #169 #170 #171
> > > > > > #172 #173 #174 #175 #176 #177 #178 #179 #180 #181 #182 #183 #184 #185 #186
> > > > > > #187 #188 #189 #190 #191 #192 #193 #194 #195 #196 #197 #198
> > > > > > --> The guest hangs forever at this point.
> > > > > >
> > > > > >
> > > > > > I have observed that certain vCPU and memory combinations work, while others
> > > > > > do not. The VM configuration I am using does not have any NUMA nodes.
> > > > > >
> > > > > > vcpus             Mem        SNP guest boot
> > > > > > <=240            19456M    Boots fine
> > > > > > > =241,<255   19456M    Hangs
> > > > > > 1-255              2048M    Boots fine
> > > > > > 1-255              4096M    Boots fine
> > > > > > > 71                 8192M    Hangs
> > > > > > > 41                 6144M    Hangs
> > > > > >
> > > > > > When I bisected this issue, it pointed to the following commit :
> > > > > >
> > > > > >
> > > > > > *commit 800f1059c99e2b39899bdc67a7593a7bea6375d8*
> > > > > > Author: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
> > > > > > Date:   Mon Mar 10 10:28:55 2025 +0200
> > > > > >
> > > > > >     mm/page_alloc: fix memory accept before watermarks gets initialized
> > > > >
> > > > > Hm. It is puzzling for me. I don't see how this commit can cause the hang.
> > > > >
> > > > > Could you track down where hang happens?
> > > >
> > > > Let me say that the guest config is key for this. Using that config, I
> > > > think you might be able to repro this on TDX. The config does turn off TDX
> > > > support, so I'm hoping that turning it on doesn't change anything.
> > > >
> > > > I've been able to track it down slightly... It is happening during the CPU
> > > > bringup trace points and it eventually gets to line 2273 in
> > > > rb_allocate_cpu_buffer() and never comes back from an alloc_pages_node()
> > > > call. That's as far as I've gotten so far. I'm not a mm expert so not sure
> > > > if I'll be able to progress much further.
> > >
> > > Urgh... It is deadlock on cpu_hotplug_lock :/
> > >
> > > _cpu_up() takes the lock on write and starts CPU bring up under it.
> > > If during CPU bringup we accept the last page in the zone, __accept_page()
> > > calls static_branch_dec() which takes the lock again.
> > >
> > > Oopsie.
> > >
> > > So the patch itself doesn't introduce a regression, but uncovers
> > > preexisting deadlock. With the patch we accept more pages during the boot
> > > and it triggers the deadlock.
> > >
> > > Let me think about the fix.
> >
> > + Static branch maintainers
> >
> > The only option I see so far is to drop static branch from this path.
> >
> > But I am not sure if it the only case were we use static branch from CPU
> > hotplug callbacks.
> >
> > Any other ideas?
>
>
> Hmmm, didn't take too close a look here, but there is the
> static_key_slow_dec_cpuslocked() variant, would that work here? Is the issue
> the caller may or may not have the cpu_hotplug lock?

Yes. This is generic page alloc path and it can be called with and without
the lock.

--
Kiryl Shutsemau / Kirill A. Shutemov