Re: [RFC] IPMI state machine regression
From: Andrew Banman
Date: Wed Aug 22 2018 - 12:24:00 EST
On Wed, Aug 22, 2018 at 11:14:52AM -0500, Corey Minyard wrote:
> On 08/21/2018 05:14 PM, Andrew Banman wrote:
> > Dear IPMI supporters,
> >
> > We observe a window in IPMI BT's opportunistic get capabilities request,
> > wherein GET_DEVICE_GUID and GET_DEVICE_ID requests may start while the BT state
> > machine is in WR_CONSUME. Following this, the 0xD5 error code is forced in
> > bt_start_transaction, IPMI fails to initialize, and the interface is torn down.
> > There is no mechanism to retry bringing up the interface in open() /dev/ipmi.
> > This leaves IPMI hosed until you reload modules. Looks to happen after we call
> > schedule().
>
> When was the latest kernel where this worked properly? Also, what hardware
> is this?
This is UV4.
First known bad commit, but I am not sure if the timing issue predates
it:
commit aa9c9ab2443e3b9562c6c7cfc245a9e43b557d14
Author: Jeremy Kerr <jk@xxxxxxxxxx>
Date: Fri Aug 25 15:47:24 2017 +0800
ipmi: allow dynamic BMC version information
Hits less frequently with older kernels so I didn't see it until
recently when it became more frequent.
>
> BTW, you can use the "hotmod" capability of the IPMI driver to add the
> device
> dynamically.
>
> -corey