Re: linux-next regression: SNP Guest boot hangs with certain cpu/mem config combination
From: Kirill A. Shutemov
Date: Fri Mar 28 2025 - 05:10:07 EST
On Fri, Mar 28, 2025 at 10:28:19AM +0200, Kirill A. Shutemov wrote:
> On Thu, Mar 27, 2025 at 07:39:22PM +0200, Kirill A. Shutemov wrote:
> > On Thu, Mar 27, 2025 at 11:02:24AM -0400, Steven Rostedt wrote:
> > > On Thu, 27 Mar 2025 16:43:43 +0200
> > > "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> wrote:
> > >
> > > > > > The only option I see so far is to drop static branch from this path.
> > > > > >
> > > > > > But I am not sure if it the only case were we use static branch from CPU
> > > > > > hotplug callbacks.
> > > > > >
> > > > > > Any other ideas?
> > > > >
> > > > >
> > > > > Hmmm, didn't take too close a look here, but there is the
> > > > > static_key_slow_dec_cpuslocked() variant, would that work here? Is the issue
> > > > > the caller may or may not have the cpu_hotplug lock?
> > > >
> > > > Yes. This is generic page alloc path and it can be called with and without
> > > > the lock.
> > >
> > > Note, it's not the static_branch that is an issue, it's enabling/disabling
> > > the static branch that is. Changing a static branch takes a bit of work as
> > > it does modify the kernel text.
> > >
> > > Is it possible to delay the update via a workqueue?
> >
> > Ah. Good point. Should work. I'll give it try.
>
> The patch below fixes problem for me.
Ah. No, it won't work. We can get there before workqueues are initialized:
mm_core_init() is called before workqueue_init_early().
We cannot queue a work. :/
Steven, any other ideas?
--
Kiryl Shutsemau / Kirill A. Shutemov