Re: [PATCH V2] Change ACPI IPMI support to "default y"

From: Matthew Garrett
Date: Thu Feb 20 2014 - 19:13:54 EST


On Thu, 2014-02-20 at 17:59 -0600, Russ Anderson wrote:
> On Thu, Feb 20, 2014 at 11:09:42PM +0000, Matthew Garrett wrote:
> > On Thu, 2014-02-20 at 16:45 -0600, Russ Anderson wrote:
> > >
> > > The ACPI spec requires IPMI functionality before a module loads at
> > > boot time? And the kernel is *broken* if it does not support ACIP IPMI
> > > functionality before module load time? Really?
> >
> > There's no mechanism to ensure that IPMI support will be loaded before
> > ACPI calls attempt to access IPMI operation regions. Really.
>
> And no mechanism can be added to ensure that ACPI call are
> not attempted before IPMI is initialized? A flag or lock
> or exported symbol indicating IPMI support is ready.

ACPI functions are a black box to drivers. You make an ACPI call, the
AML code does something. We could block there, but what's the driver
supposed to do at that point? The core could call out to a module
loader, but if the driver is built in and IPMI isn't then you'll end up
with a 60 second pause in boot and a driver that doesn't work.

> > > > ACPI 4.0 includes support for IPMI operation regions. Modular IPMI means
> > > > that the kernel will spend a significant amount of time (potentially
> > > > until a user manually loads a driver) failing to implement part of the
> > > > IPMI specification. That's a problem, and the correct fix is to ensure
> > > > that the kernel always implements IPMI support.
> > >
> > > The ACPI spec says ipmi_si cannot be a driver? Really?
> > > What is the real problem you are trying to solve?
> >
> > The most straightforward case is that of an ACPI power meter.
>
> So it is just a matter of making sure ipmi_si modules loads before
> the ACPI power meter module loads, right? module dependency issue.

No, because the power meter driver has no way of knowing that a vendor
has implemented this interface via IPMI. *Any* ACPI entry point could
theoretically reference IPMI code, even the _INI method that's called
during ACPI core init. If it does, and if you don't have built-in ACPI
support, you'd fail ACPI initialisation and things would go downhill
from there.

(I don't think failure of this magnitude is actually *likely*, but it
would be spec-compliant)

> > I've repeatedly asked for you to provide detailed descriptions of the
> > problems you've seen because I have a genuine interest in fixing them.
> > If you're just going to childishly refuse then this discussion is
> > pointless.
>
> The distro cases I would point you at are marked private.
> And you do not have access to our internal support system.
> A simple google search for "kipmi0" shows a lot of reports of
> high cpu utilization.

And nobody seems to have put any effort into figuring out what the
underlying cause is. Is it spinning because there are messages? If so,
is it because the BMC would really like some kind of response to those
messages? Is it spinning because the BMC is wedged? If so, can we detect
that case, flag it as broken and cleanly disengage?

We're running systems from a wide range of vendors (including basically
all the Tier 1 server manufacturers, plus some whitebox), use IPMI
functionality heavily and genuinely do not see the described problems. I
don't think there's evidence of widespread breakage, and where it does
exist we should treat it as we would any other bug - diagnose the
underlying problem and fix it.

--
Matthew Garrett <matthew.garrett@xxxxxxxxxx>
N‹§²æ¸›yú²X¬¶ÇvØ–)Þ{.nlj·¥Š{±‘êX§¶›¡Ü}©ž²ÆzÚj:+v‰¨¾«‘êZ+€Êzf£¢·hšˆ§~†­†Ûÿû®w¥¢¸?™¨è&¢)ßf”ùy§m…á«a¶Úÿ 0¶ìå