Re: [PATCH] kexec: do syscore_shutdown() in kernel_kexec
From: Gowans, James
Date: Tue Jan 09 2024 - 01:59:53 EST
+ akpm
Hi Eric and Andrew,
Just checking in on this patch.
Would be keen to get the fix merged if you're okay with it, or some
feedback.
Also still keen for input for the driver maintainers in CC if they
support or have objections to their shutdown hooks being invoked on
kexec.
JG
On Mon, 2023-12-18 at 14:41 +0200, James Gowans wrote:
> Hi Eric,
>
> On Wed, 2023-12-13 at 10:39 -0600, Eric W. Biederman wrote:
> >
> > James Gowans <jgowans@xxxxxxxxxx> writes:
> >
> > > syscore_shutdown() runs driver and module callbacks to get the system
> > > into a state where it can be correctly shut down. In commit
> > > 6f389a8f1dd2 ("PM / reboot: call syscore_shutdown() after disable_nonboot_cpus()")
> > > syscore_shutdown() was removed from kernel_restart_prepare() and hence
> > > got (incorrectly?) removed from the kexec flow. This was innocuous until
> > > commit 6735150b6997 ("KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown")
> > > changed the way that KVM registered its shutdown callbacks, switching from
> > > reboot notifiers to syscore_ops.shutdown. As syscore_shutdown() is
> > > missing from kexec, KVM's shutdown hook is not run and virtualisation is
> > > left enabled on the boot CPU which results in triple faults when
> > > switching to the new kernel on Intel x86 VT-x with VMXE enabled.
> > >
> > > Fix this by adding syscore_shutdown() to the kexec sequence. In terms of
> > > where to add it, it is being added after migrating the kexec task to the
> > > boot CPU, but before APs are shut down. It is not totally clear if this
> > > is the best place: in commit 6f389a8f1dd2 ("PM / reboot: call syscore_shutdown() after disable_nonboot_cpus()")
> > > it is stated that "syscore_ops operations should be carried with one
> > > CPU on-line and interrupts disabled." APs are only offlined later in
> > > machine_shutdown(), so this syscore_shutdown() is being run while APs
> > > are still online. This seems to be the correct place as it matches where
> > > syscore_shutdown() is run in the reboot and halt flows - they also run
> > > it before APs are shut down. The assumption is that the commit message
> > > in commit 6f389a8f1dd2 ("PM / reboot: call syscore_shutdown() after disable_nonboot_cpus()")
> > > is no longer valid.
> > >
> > > KVM has been discussed here as it is what broke loudly by not having
> > > syscore_shutdown() in kexec, but this change impacts more than just KVM;
> > > all drivers/modules which register a syscore_ops.shutdown callback will
> > > now be invoked in the kexec flow. Looking at some of them like x86 MCE
> > > it is probably more correct to also shut these down during kexec.
> > > Maintainers of all drivers which use syscore_ops.shutdown are added on
> > > CC for visibility. They are:
> > >
> > > arch/powerpc/platforms/cell/spu_base.c .shutdown = spu_shutdown,
> > > arch/x86/kernel/cpu/mce/core.c .shutdown = mce_syscore_shutdown,
> > > arch/x86/kernel/i8259.c .shutdown = i8259A_shutdown,
> > > drivers/irqchip/irq-i8259.c .shutdown = i8259A_shutdown,
> > > drivers/irqchip/irq-sun6i-r.c .shutdown = sun6i_r_intc_shutdown,
> > > drivers/leds/trigger/ledtrig-cpu.c .shutdown = ledtrig_cpu_syscore_shutdown,
> > > drivers/power/reset/sc27xx-poweroff.c .shutdown = sc27xx_poweroff_shutdown,
> > > kernel/irq/generic-chip.c .shutdown = irq_gc_shutdown,
> > > virt/kvm/kvm_main.c .shutdown = kvm_shutdown,
> > >
> > > This has been tested by doing a kexec on x86_64 and aarch64.
> >
> > From the 10,000 foot perspective:
> > Acked-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
>
> Thanks for the ACK!
> What's the next step to get this into the kexec tree?
>
> JG
>
> >
> >
> > Eric
> >
> > > Fixes: 6735150b6997 ("KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown")
> > >
> > > Signed-off-by: James Gowans <jgowans@xxxxxxxxxx>
> > > Cc: Eric Biederman <ebiederm@xxxxxxxxxxxx>
> > > Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> > > Cc: Sean Christopherson <seanjc@xxxxxxxxxx>
> > > Cc: Marc Zyngier <maz@xxxxxxxxxx>
> > > Cc: Arnd Bergmann <arnd@xxxxxxxx>
> > > Cc: Tony Luck <tony.luck@xxxxxxxxx>
> > > Cc: Borislav Petkov <bp@xxxxxxxxx>
> > > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> > > Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> > > Cc: Chen-Yu Tsai <wens@xxxxxxxx>
> > > Cc: Jernej Skrabec <jernej.skrabec@xxxxxxxxx>
> > > Cc: Samuel Holland <samuel@xxxxxxxxxxxx>
> > > Cc: Pavel Machek <pavel@xxxxxx>
> > > Cc: Sebastian Reichel <sre@xxxxxxxxxx>
> > > Cc: Orson Zhai <orsonzhai@xxxxxxxxx>
> > > Cc: Alexander Graf <graf@xxxxxxxxx>
> > > Cc: Jan H. Schoenherr <jschoenh@xxxxxxxxx>
> > > ---
> > > kernel/kexec_core.c | 1 +
> > > 1 file changed, 1 insertion(+)
> > >
> > > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> > > index be5642a4ec49..b926c4db8a91 100644
> > > --- a/kernel/kexec_core.c
> > > +++ b/kernel/kexec_core.c
> > > @@ -1254,6 +1254,7 @@ int kernel_kexec(void)
> > > kexec_in_progress = true;
> > > kernel_restart_prepare("kexec reboot");
> > > migrate_to_reboot_cpu();
> > > + syscore_shutdown();
> > >
> > > /*
> > > * migrate_to_reboot_cpu() disables CPU hotplug assuming that
>