In testing the latest high-res-timer code on a 2-way SMP machine I have
found that it is easy to trigger an NMI oops. This happens when the
interval is coded to be very short. Currently I have the resolution set
to 1 micro second. The test code is trying to set up a timer with 1
micro second interval, something we just don't have the horse power to
do.
This puts the cpu into a loop processing the timer pops and starves the
other cpu causing an NMI. (As I understand it, each cpu need to handle
at least one interrupt ever 500 ticks to avoid the NMI. Usually
(always?) this interrupt is provided by the timer, however, the timer,
for some reason, always interrupts on the using (abusing) cpu. (Any one
who has a better understanding of the NMI watch dog operation, please
help me here.)
Solutions:
It seems prudent to limit the interval so as to keep it large enough to
avoid this issue. The problem is that, no matter what the limit is, it
can be subverted by starting another timer. I have considered keeping
track of all active timers with intervals under a jiffie and sliding the
limit based on that, but it seem to me that this is decidedly
unpredictable.
The code I currently have measures the supportable interval at boot time
and then, when ever a timer is about to be requested whose time will
fall in that interval, pushing the timer out (by adding intervals).
Alarms skipped by this are accounted for in the overrun counter.
However, as I observed above, this is easily defeated by starting
another timer.
Comments?
-- George george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Real time sched: http://sourceforge.net/projects/rtsched/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Sat Dec 15 2001 - 21:00:21 EST