Re: pstore does not work under xen
From: Kees Cook
Date: Mon Sep 23 2019 - 18:59:13 EST
On Mon, Sep 23, 2019 at 03:42:27PM +0000, James Dingwall wrote:
> On Thu, Sep 19, 2019 at 12:37:40PM -0400, Boris Ostrovsky wrote:
> > On 9/19/19 12:14 PM, James Dingwall wrote:
> > > On Thu, Sep 19, 2019 at 03:51:33PM +0000, Luck, Tony wrote:
> > >>> I have been investigating a regression in our environment where pstore
> > >>> (efi-pstore specifically but I suspect this would affect all
> > >>> implementations) no longer works after upgrading from a 4.4 to 5.0
> > >>> kernel when running under xen. (This is an Ubuntu kernel but I don't
> > >>> think there are patches which affect this area.)
> > >> I don't have any answer for this ... but want to throw out the idea that
> > >> VMM systems could provide some hypercalls to guests to save/return
> > >> some blob of memory (perhaps the "save" triggers automagically if the
> > >> guest crashes?).
> > >>
> > >> That would provide a much better pstore back end than relying on emulation
> > >> of EFI persistent variables (which have severe contraints on size, and don't
> > >> support some pstore modes because you can't dynamically update EFI variables
> > >> hundreds of times per second).
> > >>
> > > For clarification this is a dom0 crash rather than an HVM guest with EFI. I
> > > should probably have also mentioned the xen verion has changed from 4.8.4 to
> > > 4.11.2 in case its behaviour on detection of crashed domain has changed.
> > >
> > > (For capturing guest crashes we have enabled xenconsole logging so the
> > > hvc0 log is available in dom0.)
> >
> >
> > Do you only see this difference between 4.4 and 5.0 when you crash via
> > sysrq?
> >
> > Because that's where things changed. On 4.4 we seem to be forcing an
> > oops, which eventually calls kmsg_dump() and then panic. On 5.0 we call
> > panic() directly from sysrq handler. And because Xen's panic notifier
> > doesn't return we never get a chance to call kmsg_dump().
> >
>
> Ok, I see that change in 8341f2f222d729688014ce8306727fdb9798d37e. I
> hadn't tested it any other way before. Using the null pointer
> de-reference module code at [1] a pstore record is generated as expected
> when the module is loaded (panic_on_oops=1).
This change looks correct -- it just gets us directly to the panic()
state instead of exercising the various exception handlers.
> I have also tested swapping the kmsg_dump() /
> atomic_notifier_call_chain() around in panic.c and this also results in
> a pstore record being created with sysrq-c. I don't know if that would
> be an acceptable solution though since it may break behaviour that other
> things depend on.
I don't think reordering these is a good idea: as the comments say,
there might be work done in the notifier chain that kmsg_dump() will
want to capture (e.g. the KASLR base offset).
The situation seems to be that notifier callbacks must return -- I think
Xen needs fixing here.
--
Kees Cook