Re: Issue with panic handling and ipmi
From: Anton Lundin
Date: Fri Sep 17 2021 - 08:55:34 EST
On 17 September, 2021 - Corey Minyard wrote:
> On Fri, Sep 17, 2021 at 12:14:19PM +0200, Anton Lundin wrote:
> > On 16 September, 2021 - Corey Minyard wrote:
> >
> > > On Thu, Sep 16, 2021 at 04:53:00PM +0200, Anton Lundin wrote:
> > > > Hi.
> > > >
> > > > I've just done a upgrade of the kernel we're using in a product from
> > > > 4.19 to 5.10 and I noted a issue.
> > > >
> > > > It started that with that we didn't get panic and oops dumps in our erst
> > > > backed pstore, and when debugging that I noted that the reboot on panic
> > > > timer didn't work either.
> > > >
> > > > I've bisected it down to 2033f6858970 ("ipmi: Free receive messages when
> > > > in an oops").
> > >
> > > Hmm. Unfortunately removing that will break other things. Can you try
> > > the following patch? It's a good idea, in general, to do as little as
> > > possible in the panic path, this should cover a multitude of issues.
> > >
> > > Thanks for the report.
> > >
> >
> > I'm sorry to report that the patch didn't solve the issue, and the
> > machine locked up in the panic path as before.
>
> I missed something. Can you try the following? If this doesn't work,
> I'm going to have to figure out how to reproduce this.
>
Sorry, still no joy.
My guess is that there is something locking up due to these Supermicro
machines have their ERST memory backed by the BMC, and the same BMC is
is the other end of all the ipmi communications.
I've reproduced this on Server/X11SCZ-F and Server/H11SSL-i but I'm
guessing it can be reproduced on most, if not all, of their hardware
with the same setup.
We're using the ERST backend for pstore, because we're still
bios-booting them and don't have efi services available to use as pstore
backend.
I've tested to just yank out the ipmi modules from the kernel and that
fixes the panic timer and we get crash dumps to pstore.
//Anton