Re: [RT] [RFC] simple SMI detector

From: Jon Masters
Date: Sat Jan 24 2009 - 19:57:28 EST


On Sat, 2009-01-24 at 17:30 +0100, Thomas Gleixner wrote:
> On Fri, 23 Jan 2009, Lee Revell wrote:
> >
> > FYI the RTAI project has a patch to allow disabling of SMI:
> >
> > https://listas.upv.es/pipermail/rtlinuxgpl/2007-April/000609.html
>
> FYI, this patch should be removed from the internet archives and the
> author should be made liable for the damage which can happen when
> innocent users try to use it for fixing things which simply can not be
> fixed.

Well, we disagree on the remedy there (I personally thing "caveat
emptor", although European law might well differ in that regard), but I
think a warning is very much called for, especially where end users are
concerned. Let's be sure those following along the LKML archives
understand something:

1). You (the end user) don't want to disable SMIs on existing systems.
2). Doing so might prevent thermal management software, system
management software, BIOS implemented fixes, and other components from
operating.
3). BIOS vendors might need to be encouraged to reduce SMI overhead.
Once systems have been designed to rely on lots of SMI overhead, the
game has already been lost. No amount of user fiddling will adequately
fix that.

What we are interested in doing with things like smi_detector is
noticing when there is substantial overhead, so that we can politely ask
the vendors involved to implement alternative SMI handlers that have
less of a hit on RT performance - the only real solution involving no
SMIs is more expensive hardware with dedicated system management/thermal
management processors.

> >> .... disabled SMIs with RTAI code the latencies of cyclictest were
> >> good, but after about 20 minutes my system stopped working and now
> >> it does not even boot anymore. :( Any hint?
> >
> > Yeah. Buy a new CPU. You deep-fried your P4 because you disabled the
> > thermal protection.
>
> Aside of this worst case scenario, disabling SMIs is problematic as
> SMI is used to emulate devices, to fix chip bugs and necessary for
> enhanced features.
>
> The only reasonable thing you can do on a SMI plagued system is to
> identify the device which makes use of SMIs. Legacy ISA devices and
> USB are usually good candidates. If that does not help, don't use it
> for real-time :)

Indeed. This is why I wrote an smi_detector that sits in kernel space
and can be reasonably sure measured discrepancies are attributable to
SMI behavior. We want to log and detect such things before RT systems
are deployed, not have users actively trying to work around SMI overhead
after the fact.

Jon.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/