Re: [Openipmi-developer] Issue with panic handling and ipmi

From: Corey Minyard
Date: Fri Sep 17 2021 - 09:19:36 EST


On Fri, Sep 17, 2021 at 02:55:25PM +0200, Anton Lundin wrote:
> On 17 September, 2021 - Corey Minyard wrote:
>
> > On Fri, Sep 17, 2021 at 12:14:19PM +0200, Anton Lundin wrote:
> > > On 16 September, 2021 - Corey Minyard wrote:
> > >
> > > > On Thu, Sep 16, 2021 at 04:53:00PM +0200, Anton Lundin wrote:
> > > > > Hi.
> > > > >
> > > > > I've just done a upgrade of the kernel we're using in a product from
> > > > > 4.19 to 5.10 and I noted a issue.
> > > > >
> > > > > It started that with that we didn't get panic and oops dumps in our erst
> > > > > backed pstore, and when debugging that I noted that the reboot on panic
> > > > > timer didn't work either.
> > > > >
> > > > > I've bisected it down to 2033f6858970 ("ipmi: Free receive messages when
> > > > > in an oops").
> > > >
> > > > Hmm. Unfortunately removing that will break other things. Can you try
> > > > the following patch? It's a good idea, in general, to do as little as
> > > > possible in the panic path, this should cover a multitude of issues.
> > > >
> > > > Thanks for the report.
> > > >
> > >
> > > I'm sorry to report that the patch didn't solve the issue, and the
> > > machine locked up in the panic path as before.
> >
> > I missed something. Can you try the following? If this doesn't work,
> > I'm going to have to figure out how to reproduce this.
> >
>
> Sorry, still no joy.
>
> My guess is that there is something locking up due to these Supermicro
> machines have their ERST memory backed by the BMC, and the same BMC is
> is the other end of all the ipmi communications.
>
> I've reproduced this on Server/X11SCZ-F and Server/H11SSL-i but I'm
> guessing it can be reproduced on most, if not all, of their hardware
> with the same setup.
>
> We're using the ERST backend for pstore, because we're still
> bios-booting them and don't have efi services available to use as pstore
> backend.
>
>
> I've tested to just yank out the ipmi modules from the kernel and that
> fixes the panic timer and we get crash dumps to pstore.

Dang. I'm going to have to look deeper at what that could change to
cause an issue like this. Are you using the IPMI watchdog? Do you have
CONFIG_IPMI_PANIC_EVENT=y set in the config?

Thanks,

-corey

>
> //Anton
>
>
> _______________________________________________
> Openipmi-developer mailing list
> Openipmi-developer@xxxxxxxxxxxxxxxxxxxxx
> https://lists.sourceforge.net/lists/listinfo/openipmi-developer