Re: pstore does not work under xen
From: James Dingwall
Date: Mon Sep 23 2019 - 11:42:34 EST
On Thu, Sep 19, 2019 at 12:37:40PM -0400, Boris Ostrovsky wrote:
> On 9/19/19 12:14 PM, James Dingwall wrote:
> > On Thu, Sep 19, 2019 at 03:51:33PM +0000, Luck, Tony wrote:
> >>> I have been investigating a regression in our environment where pstore
> >>> (efi-pstore specifically but I suspect this would affect all
> >>> implementations) no longer works after upgrading from a 4.4 to 5.0
> >>> kernel when running under xen. (This is an Ubuntu kernel but I don't
> >>> think there are patches which affect this area.)
> >> I don't have any answer for this ... but want to throw out the idea that
> >> VMM systems could provide some hypercalls to guests to save/return
> >> some blob of memory (perhaps the "save" triggers automagically if the
> >> guest crashes?).
> >>
> >> That would provide a much better pstore back end than relying on emulation
> >> of EFI persistent variables (which have severe contraints on size, and don't
> >> support some pstore modes because you can't dynamically update EFI variables
> >> hundreds of times per second).
> >>
> > For clarification this is a dom0 crash rather than an HVM guest with EFI. I
> > should probably have also mentioned the xen verion has changed from 4.8.4 to
> > 4.11.2 in case its behaviour on detection of crashed domain has changed.
> >
> > (For capturing guest crashes we have enabled xenconsole logging so the
> > hvc0 log is available in dom0.)
>
>
> Do you only see this difference between 4.4 and 5.0 when you crash via
> sysrq?
>
> Because that's where things changed. On 4.4 we seem to be forcing an
> oops, which eventually calls kmsg_dump() and then panic. On 5.0 we call
> panic() directly from sysrq handler. And because Xen's panic notifier
> doesn't return we never get a chance to call kmsg_dump().
>
Ok, I see that change in 8341f2f222d729688014ce8306727fdb9798d37e. I
hadn't tested it any other way before. Using the null pointer
de-reference module code at [1] a pstore record is generated as expected
when the module is loaded (panic_on_oops=1).
I have also tested swapping the kmsg_dump() /
atomic_notifier_call_chain() around in panic.c and this also results in
a pstore record being created with sysrq-c. I don't know if that would
be an acceptable solution though since it may break behaviour that other
things depend on.
James
[1] http://ubuntu.5.x6.nabble.com/How-To-Cause-An-Oops-td3681145.html