RE: [PATCH] x86/hyperv: Fix kexec panic/hang issues

From: Dexuan Cui
Date: Mon Dec 21 2020 - 20:05:35 EST


> From: Michael Kelley
> Sent: Monday, December 21, 2020 3:33 PM
> From: Dexuan Cui
> Sent: Sunday, December 20, 2020 2:30 PM
> >
> > Currently the kexec kernel can panic or hang due to 2 causes:
> >
> > 1) hv_cpu_die() is not called upon kexec, so the hypervisor corrupts the
> > VP Assist Pages when the kexec kernel runs. We ever fixed the same issue
>
> Spurious word "ever". And avoid first person "we". Perhaps:
>
> The same issue is fixed for hibernation in commit ..... . Now fix it for
> kexec.

Thanks! Will use this in v2.

> > for hibernation in
> > commit 421f090c819d ("x86/hyperv: Suspend/resume the VP assist page for
> hibernation")
> > and now let's fix it for kexec.
>
> Is the VP Assist Page getting cleared anywhere on the panic path? We can

It's not.

> only do it for the CPU that runs panic(), but I don't think it is getting cleared
> even for that CPU. It is cleared only in hv_cpu_die(), and that's not called on
> the panic path.

IMO kdump is different from the non-kdump kexec in that the kdump kernel
runs without depending on the memory used by the first kernel, so it looks
unnecessary to clear the first kernel's VP Assist Page (and the hypercallpage).
According to my test, the second kernel can re-enble the VP Asist Page and
the hypercall page using different GPAs, without disabling the old pages first.
Of course, in the future Hyper-V may require the guest to disable the pages first
before trying to re-enabling them, so I agree we'd better clear the pages in the
first kernell like this:

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 4638a52d8eae..8022f51c9c05 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -202,7 +202,7 @@ void clear_hv_tscchange_cb(void)
}
EXPORT_SYMBOL_GPL(clear_hv_tscchange_cb);

-static int hv_cpu_die(unsigned int cpu)
+int hv_cpu_die(unsigned int cpu)
{
struct hv_reenlightenment_control re_ctrl;
unsigned int new_cpu;
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 30f76b966857..d090e781d216 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -76,6 +76,8 @@ static inline void hv_disable_stimer0_percpu_irq(int irq) {}
#if IS_ENABLED(CONFIG_HYPERV)
extern int hyperv_init_cpuhp;

+int hv_cpu_die(unsigned int cpu);
+
extern void *hv_hypercall_pg;
extern void __percpu **hyperv_pcpu_input_arg;

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 43b54bef5448..e54f8262bfe0 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -156,6 +156,9 @@ static void hv_machine_crash_shutdown(struct pt_regs *regs)
if (hv_crash_handler)
hv_crash_handler(regs);

+ /* Only call this on the faulting CPU. */
+ hv_cpu_die(raw_smp_processor_id());
+
/* The function calls crash_smp_send_stop(). */
native_machine_crash_shutdown(regs);

> > 2) hyperv_cleanup() is called too early. In the kexec path, the other CPUs
> > are stopped in hv_machine_shutdown() -> native_machine_shutdown(), so
> > between hv_kexec_handler() and native_machine_shutdown(), the other
> CPUs
> > can still try to access the hypercall page and cause panic. The workaround
> > "hv_hypercall_pg = NULL;" in hyperv_cleanup() can't work reliably.
>
> I would note that the comment in hv_suspend() is also incorrect on this
> point. Setting hv_hypercall_pg to NULL does not cause subsequent
> hypercalls to fail safely. The fast hypercalls don't test for it, and even if they
> did test like hv_do_hypercall(), the test just creates a race condition.

The comment in hv_suspend() should be correct because hv_suspend()
is only called during hibernation from the syscore_ops path where only
one CPU is active, e.g. for the suspend operation, it's called from
state_store
hibernate
hibernation_snapshot
create_image
suspend_disable_secondary_cpus
syscore_suspend
hv_suspend

It's similar for the resume operation:
resume_store
software_resume
load_image_and_restore
hibernation_restore
resume_target_kernel
hibernate_resume_nonboot_cpu_disable
syscore_suspend
hv_suspend

> > static void hv_machine_crash_shutdown(struct pt_regs *regs)
> > {
> > if (hv_crash_handler)
> > hv_crash_handler(regs);
> > +
> > + /* The function calls crash_smp_send_stop(). */
>
> Actually, crash_smp_send_stop() or smp_send_stop() has already been
> called earlier by panic(),

This is true only when the Hyper-V host supports the feature
HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE. On an old Hyper-V host
without the feature, ms_hyperv_init_platform() doesn't set
crash_kexec_post_notifiers, so crash_kexec_post_notifiers keeps its
initial value "false", and panic() calls smp_send_stop() *after*
__crash_kexec() (which calls machine_crash_shutdown() ->
hv_machine_crash_shutdown()).

> so there's already only a single CPU running at
> this point. crash_smp_send_stop() is called again in
> native_machine_crash_shutdown(), but it has a flag to prevent it from
> running again.
>
> > native_machine_crash_shutdown(regs);
> > +
> > + /* Disable the hypercall page when there is only 1 active CPU. */
> > + hyperv_cleanup();
>
> Moving the call to hyperv_cleanup() in the panic path is OK, and it makes
> the panic and kexec() paths more similar, but I don't think it is necessary.
> As noted above, the other CPUs have already been stopped, so they shouldn't
> be executing any hypercalls.

As I explained above, it's necessary for old Hyper-V hosts. :-)

Thanks,
-- Dexuan