RE: [PATCH 3/3] Drivers: hv: vmbus: Make pannic reporting to be more useful

From: KY Srinivasan
Date: Thu Oct 12 2017 - 11:07:08 EST




> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@xxxxxxxxxx]
> Sent: Thursday, October 12, 2017 6:40 AM
> To: kys@xxxxxxxxxxxxxxxxxxxxxx
> Cc: gregkh@xxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> devel@xxxxxxxxxxxxxxxxxxxxxx; olaf@xxxxxxxxx; apw@xxxxxxxxxxxxx;
> jasowang@xxxxxxxxxx; leann.ogasawara@xxxxxxxxxxxxx;
> marcelo.cerri@xxxxxxxxxxxxx; Stephen Hemminger
> <sthemmin@xxxxxxxxxxxxx>; KY Srinivasan <kys@xxxxxxxxxxxxx>
> Subject: Re: [PATCH 3/3] Drivers: hv: vmbus: Make pannic reporting to be
> more useful
>
> kys@xxxxxxxxxxxxxxxxxxxxxx writes:
>
> > From: "K. Y. Srinivasan" <kys@xxxxxxxxxxxxx>
> >
> > Hyper-V allows the guest to report panic and the guest can pass additional
> > information. All this is logged on the host. Currently Linux is passing back
> > information that is not particularly useful. Make the following changes:
> >
> > 1. Windows uses crash MSR P0 to report bugcheck code. Follow the same
> > convention for Linux as well.
> > 2. It will be useful to know the gust ID of the Linux guest that has
> > paniced. Pass back this information.
> >
> > These changes will help in better supporting Linux on Hyper-V
> >
> > Signed-off-by: K. Y. Srinivasan <kys@xxxxxxxxxxxxx>
> > ---
> > arch/x86/hyperv/hv_init.c | 11 +++++++----
> > arch/x86/include/asm/mshyperv.h | 2 +-
> > drivers/hv/vmbus_drv.c | 4 ++--
> > 3 files changed, 10 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> > index 1a8eb550c40f..cc30a094fb7c 100644
> > --- a/arch/x86/hyperv/hv_init.c
> > +++ b/arch/x86/hyperv/hv_init.c
> > @@ -205,9 +205,10 @@ void hyperv_cleanup(void)
> > }
> > EXPORT_SYMBOL_GPL(hyperv_cleanup);
> >
> > -void hyperv_report_panic(struct pt_regs *regs)
> > +void hyperv_report_panic(struct pt_regs *regs, long err)
> > {
> > static bool panic_reported;
> > + u64 guest_id;
> >
> > /*
> > * We prefer to report panic on 'die' chain as we have proper
> > @@ -218,9 +219,11 @@ void hyperv_report_panic(struct pt_regs *regs)
> > return;
> > panic_reported = true;
> >
> > - wrmsrl(HV_X64_MSR_CRASH_P0, regs->ip);
> > - wrmsrl(HV_X64_MSR_CRASH_P1, regs->ax);
> > - wrmsrl(HV_X64_MSR_CRASH_P2, regs->bx);
> > + rdmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);
> > +
> > + wrmsrl(HV_X64_MSR_CRASH_P0, err);
> > + wrmsrl(HV_X64_MSR_CRASH_P1, guest_id);
>
> This is a constant we write in hyperv_init() (0x810000040e000000 for
> Linux guests). Do I get it right that we need this to basically
> distinguigh Windows guests crashes from Linux guest crashes in the log?

Yes; we have a huge infrastructure for analyzing the event logs on Azure hosts and guest panics
are logged into this pipeline. Unfortunately, there is not a simple way to distinguish between
Windows and Linux in this information stream.
>
> > + wrmsrl(HV_X64_MSR_CRASH_P2, regs->ip);
> > wrmsrl(HV_X64_MSR_CRASH_P3, regs->cx);
> > wrmsrl(HV_X64_MSR_CRASH_P4, regs->dx);
>
> We don't write ax and bx any more but write cx and dx. Not that I see
> these regs really useful in the log but I'd change these to ax and bx
> for consistency. Or mayme make it sp and ax?

I will make it sp and ax.

Thanks,

K. Y
>
> >
> > diff --git a/arch/x86/include/asm/mshyperv.h
> b/arch/x86/include/asm/mshyperv.h
> > index 63cc96f064dc..dd2dc54ddf20 100644
> > --- a/arch/x86/include/asm/mshyperv.h
> > +++ b/arch/x86/include/asm/mshyperv.h
> > @@ -311,7 +311,7 @@ static inline int hv_cpu_number_to_vp_number(int
> cpu_number)
> > void hyperv_init(void);
> > void hyperv_setup_mmu_ops(void);
> > void hyper_alloc_mmu(void);
> > -void hyperv_report_panic(struct pt_regs *regs);
> > +void hyperv_report_panic(struct pt_regs *regs, long err);
> > bool hv_is_hypercall_page_setup(void);
> > void hyperv_cleanup(void);
> > #else /* CONFIG_HYPERV */
> > diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> > index e8cd19095212..852f6c628836 100644
> > --- a/drivers/hv/vmbus_drv.c
> > +++ b/drivers/hv/vmbus_drv.c
> > @@ -65,7 +65,7 @@ static int hyperv_panic_event(struct notifier_block
> *nb, unsigned long val,
> >
> > regs = current_pt_regs();
> >
> > - hyperv_report_panic(regs);
> > + hyperv_report_panic(regs, val);
> > return NOTIFY_DONE;
> > }
> >
> > @@ -75,7 +75,7 @@ static int hyperv_die_event(struct notifier_block *nb,
> unsigned long val,
> > struct die_args *die = (struct die_args *)args;
> > struct pt_regs *regs = die->regs;
> >
> > - hyperv_report_panic(regs);
> > + hyperv_report_panic(regs, val);
> > return NOTIFY_DONE;
> > }
>
> --
> Vitaly