RE: [PATCH 0/4] x86/Hyper-V: Panic code path fixes

From: Michael Kelley
Date: Thu Mar 19 2020 - 12:07:36 EST


From: Michael Kelley <mikelley@xxxxxxxxxxxxx> Sent: Thursday, March 19, 2020 8:15 AM
>
> From: Tianyu Lan <Tianyu.Lan@xxxxxxxxxxxxx> Sent: Thursday, March 19, 2020 7:08 AM
> > >>
> > >> This patchset fixes some issues in the Hyper-V panic code path.
> > >> Patch 1 resolves issue that panic system still responses network
> > >> packets.
> > >> Patch 2-3 resolves crash enlightenment issues.
> > >> Patch 4 is to set crash_kexec_post_notifiers to true for Hyper-V
> > >> VM in order to report crash data or kmsg to host before running
> > >> kdump kernel.
> > >
> > > I still see an issue that isn't addressed by these patches. The VMbus
> > > driver registers a "die notifier" and a "panic notifier". But die() will
> > > eventually call panic() if panic_on_oops is set (which I think it typically
> > > is). If the CRASH_NOTIFY_MSG option is *not* enabled, then
> > > hyperv_report_panic() could get called by the die notifier, and then
> > > again by the panic notifier.
> > >
> > > Do we even need the "die notifier"? If it was removed, there would
> > > not be any notification to Hyper-V via the die() path unless panic_on_oops
> > > is set, which I think is actually the correct behavior. I'm not
> > > completely clear on what is supposed to happen in general to the
> > > Linux kernel if panic_on_oops is not set. Does it try to continue to run?
> > > If so, then we should not be notifying Hyper-V if panic_on_oops is not
> > > set, and removing the die notifier is the right thing to do.
> > >
> >
> > hyperv_report_panic() has re-enter check inside and so kernel only
> > reports crash register data once during die().
>
> Ah, yes, you are right.
>
> > From comment in the
> > hyperv_report_panic(), register value reported in die chain is more
> > exact than value in panic chain. The register value in die chain is
> > passed by die() caller. Register value reported in panic chain
> > is collected in the hyperv_panic_event().
> >
> > If panic_on_oops is not set, the task should be killed and kernel
> > still runs. In this case, we may not trigger crash enlightenment.
>
> I'm not completely clear on your last statement. It seems like there
> is still a problem in that die() will call hyperv_report_panic() even if
> panic_on_oops is not set. We will have reported a panic to Hyper-V
> even though the VM did not stop running.
>
> Michael

There's one more issue to consider. hv_kmsg_dump() skips calling
hyperv_report_panic_msg() if sysctl_record_panic_msg has been cleared
by a sysctl command. (This sysctl option gives a customer the ability to
increase privacy by not having the VM's dmesg contents sent to Hyper-V.)
In this case, the earlier hyperv_report_panic() call should be used. Otherwise,
there would not be any notification to Hyper-V about the panic.

Michael