Re: [PATCH -v3] perf, x86: try to handle unknown nmis with runningperfctrs

From: Cyrill Gorcunov
Date: Wed Aug 25 2010 - 16:25:08 EST


On Wed, Aug 25, 2010 at 04:11:06PM -0400, Don Zickus wrote:
...
> > Uhhuh. NMI received for unknown reason 00 on CPU 15.
> > Do you have a strange power saving mode enabled?
> > Dazed and confused, but trying to continue
>
> So I found a Nehalem box that can reliably reproduce Ingo's problem using
> something as simple 'perf top'. But like above, I am noticing the
> samething, an extra NMI(PMI??) that comes out of nowhere.
>
> Looking at the data above the delta between nmis is very small compared to
> the other nmis. It almost suggests that this is an extra PMI.
> Considering there is already two cpu errata discussing extra PMIs under
> certain configurations, I wouldn't be surprised if this was a third.
>
> Cheers,
> Don
>

Oh. I'm not sure if it would be a good idea at all but maybe we could
use kind of Robert's idea about "pmu nmi relaxing time" ie some time
slice in which we treat nmi's as being from pmu, but not arbitrary number
but equal to the number of PMI turned off. Say we handle NMI and found
that 4 events are overflowed, we clear them, arm timer and wait for
3 unknow nmis to happen, if they are not happening during some time
period we clear this waitqueue, if they happen or partially happen
- we destroy the timer. Ie almost the same as Robert's idea but
without tsc? Just a thought.

-- Cyrill
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/