Corey Minyard writes:Ok, no problem.
> +static int k7_watchdog_reset(int handled)
> +{
> + unsigned int low, high;
> + int source;
> +
> + rdmsr(MSR_K7_PERFCTR0, low, high);
Please use rdpmc() instead of rdmsr() when reading counter registers.
Ditto in the other places.
(I know oprofile doesn't, but that's no excuse.)
> + /* > + * If the timer has overflowed, this is certainly a watchdogIIRC, the docs state that timer goes off if the high bit is cleared in the register. I was just going with the documentation description. Not a big deal either way, I don't think.
> + * source
> + */
> + source = (low & (1 << 31)) == 0;
> + if (source)
Why not "if ((int)low >= 0)"?
> + /*For nmi_watchdog=1, I can come from both the performance counters and the IOAPIC only if they both go off at almost exactly the same time (there is only one edge for both). Otherwise, the oprofile code can tell if it is the source, and it will return if it handled it or not. The race is small, but there. I guess I might have to implement the ugliness I describe below.
> + * The only thing that SHOULD be before us is the oprofile
> + * code. If it has handled an NMI, then we shouldn't. This
> + * is a rather unnatural relationship, it would much better to
> + * build a perf-counter handler and then tie both the
> + * watchdog and oprofile code to it. Then this ugliness
> + * could go away.
> + */
Depending on the value of nmi_watchdog and how oprofile was
set up, neither, just one, or both of them can cause NMIs.
Only one of them can do it via the performance counters, however.
How do you handle multiple simultaneous NMIs from different sources?That's why the current NMI hardware sucks so bad. In general, you cannot tell easily. However, you can tell if the NMI came from at least the NMI watchdog if nmi_watchdog=2. And you can usually tell if it was an ECC error or I/O error. I'm working on getting things changed in IPMI to be able to tell if one came from IPMI. At OLS, we talked about this for a while and people are going to start trying to push the hardware vendors to improve the NMI hardware so the source can always be known. So things are not perfect, but I believe they are usable.