RE: [PATCH] x86/hyperv: Fix kexec panic/hang issues

From: Michael Kelley
Date: Mon Dec 21 2020 - 22:39:36 EST


From: Dexuan Cui <decui@xxxxxxxxxxxxx> Sent: Monday, December 21, 2020 5:04 PM
>
> > From: Michael Kelley
> > Sent: Monday, December 21, 2020 3:33 PM
> > From: Dexuan Cui
> > Sent: Sunday, December 20, 2020 2:30 PM
> > >
> > > Currently the kexec kernel can panic or hang due to 2 causes:
> > >
> > > 1) hv_cpu_die() is not called upon kexec, so the hypervisor corrupts the
> > > VP Assist Pages when the kexec kernel runs. We ever fixed the same issue
> >
> > Spurious word "ever". And avoid first person "we". Perhaps:
> >
> > The same issue is fixed for hibernation in commit ..... . Now fix it for
> > kexec.
>
> Thanks! Will use this in v2.
>
> > > for hibernation in
> > > commit 421f090c819d ("x86/hyperv: Suspend/resume the VP assist page for
> > hibernation")
> > > and now let's fix it for kexec.
> >
> > Is the VP Assist Page getting cleared anywhere on the panic path? We can
>
> It's not.
>
> > only do it for the CPU that runs panic(), but I don't think it is getting cleared
> > even for that CPU. It is cleared only in hv_cpu_die(), and that's not called on
> > the panic path.
>
> IMO kdump is different from the non-kdump kexec in that the kdump kernel
> runs without depending on the memory used by the first kernel, so it looks
> unnecessary to clear the first kernel's VP Assist Page (and the hypercallpage).
> According to my test, the second kernel can re-enble the VP Asist Page and
> the hypercall page using different GPAs, without disabling the old pages first.

Ah yes. You are right. The kdump kernel must be using a disjoint area of
physical memory, so not clearing these per-CPU overlay pages shouldn't
put the kdump kernel at risk.

> Of course, in the future Hyper-V may require the guest to disable the pages first
> before trying to re-enabling them, so I agree we'd better clear the pages in the
> first kernell like this:
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 4638a52d8eae..8022f51c9c05 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -202,7 +202,7 @@ void clear_hv_tscchange_cb(void)
> }
> EXPORT_SYMBOL_GPL(clear_hv_tscchange_cb);
>
> -static int hv_cpu_die(unsigned int cpu)
> +int hv_cpu_die(unsigned int cpu)
> {
> struct hv_reenlightenment_control re_ctrl;
> unsigned int new_cpu;
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 30f76b966857..d090e781d216 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -76,6 +76,8 @@ static inline void hv_disable_stimer0_percpu_irq(int irq) {}
> #if IS_ENABLED(CONFIG_HYPERV)
> extern int hyperv_init_cpuhp;
>
> +int hv_cpu_die(unsigned int cpu);
> +
> extern void *hv_hypercall_pg;
> extern void __percpu **hyperv_pcpu_input_arg;
>
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 43b54bef5448..e54f8262bfe0 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -156,6 +156,9 @@ static void hv_machine_crash_shutdown(struct pt_regs *regs)
> if (hv_crash_handler)
> hv_crash_handler(regs);
>
> + /* Only call this on the faulting CPU. */
> + hv_cpu_die(raw_smp_processor_id());
> +
> /* The function calls crash_smp_send_stop(). */
> native_machine_crash_shutdown(regs);

Since we don't *need* to do this, I think it might be less risky to just leave
the code "as is". But I'm OK either way.

>
> > > 2) hyperv_cleanup() is called too early. In the kexec path, the other CPUs
> > > are stopped in hv_machine_shutdown() -> native_machine_shutdown(), so
> > > between hv_kexec_handler() and native_machine_shutdown(), the other
> > CPUs
> > > can still try to access the hypercall page and cause panic. The workaround
> > > "hv_hypercall_pg = NULL;" in hyperv_cleanup() can't work reliably.
> >
> > I would note that the comment in hv_suspend() is also incorrect on this
> > point. Setting hv_hypercall_pg to NULL does not cause subsequent
> > hypercalls to fail safely. The fast hypercalls don't test for it, and even if they
> > did test like hv_do_hypercall(), the test just creates a race condition.
>
> The comment in hv_suspend() should be correct because hv_suspend()
> is only called during hibernation from the syscore_ops path where only
> one CPU is active, e.g. for the suspend operation, it's called from
> state_store
> hibernate
> hibernation_snapshot
> create_image
> suspend_disable_secondary_cpus
> syscore_suspend
> hv_suspend
>
> It's similar for the resume operation:
> resume_store
> software_resume
> load_image_and_restore
> hibernation_restore
> resume_target_kernel
> hibernate_resume_nonboot_cpu_disable
> syscore_suspend
> hv_suspend

I agree the hv_suspend() code is correct. I read the second sentence of
the comment as being a more general statement that hypercalls could be
cleanly stopped by setting hv_hypercall_pg to NULL, which isn't true.

>
> > > static void hv_machine_crash_shutdown(struct pt_regs *regs)
> > > {
> > > if (hv_crash_handler)
> > > hv_crash_handler(regs);
> > > +
> > > + /* The function calls crash_smp_send_stop(). */
> >
> > Actually, crash_smp_send_stop() or smp_send_stop() has already been
> > called earlier by panic(),
>
> This is true only when the Hyper-V host supports the feature
> HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE. On an old Hyper-V host
> without the feature, ms_hyperv_init_platform() doesn't set
> crash_kexec_post_notifiers, so crash_kexec_post_notifiers keeps its
> initial value "false", and panic() calls smp_send_stop() *after*
> __crash_kexec() (which calls machine_crash_shutdown() ->
> hv_machine_crash_shutdown()).

OK, I see your point.

>
> > so there's already only a single CPU running at
> > this point. crash_smp_send_stop() is called again in
> > native_machine_crash_shutdown(), but it has a flag to prevent it from
> > running again.
> >
> > > native_machine_crash_shutdown(regs);
> > > +
> > > + /* Disable the hypercall page when there is only 1 active CPU. */
> > > + hyperv_cleanup();
> >
> > Moving the call to hyperv_cleanup() in the panic path is OK, and it makes
> > the panic and kexec() paths more similar, but I don't think it is necessary.
> > As noted above, the other CPUs have already been stopped, so they shouldn't
> > be executing any hypercalls.
>
> As I explained above, it's necessary for old Hyper-V hosts. :-)

Agreed.