Re: NMI watchdog + NOHZ question
From: David Miller
Date: Wed Jun 24 2009 - 05:44:37 EST
From: Andi Kleen <andi@xxxxxxxxxxxxxx>
Date: Wed, 24 Jun 2009 09:53:42 +0200
>> > Ah you have a one shot timer and it gets rescheduled in the softirq?
>> > If yes why not in doing that directly in the hardirq handler?
>>
>> Then what's the point of the generic timer code supporting one-shot
>> clock sources? :-)
>
> Well it would avoid that problem at least (I think based on your
> description). Somehow you need to reschedule the timer before the softirq.
>
> I guess you could have a generic function that is callable from hardirq
> directly?
Thinking about this some more, the issue I'm hitting has nothing to
do with how the timer fires.
The problem occurs when the cpu goes into NOHZ mode, and the timer
is not firing. And I suspect x86 would hit this problem too as
currently coded.
Using sparc64 first as a concrete example, the idle loop is essentially:
while(1) {
tick_nohz_stop_sched_tick(1);
while (!need_resched() && !cpu_is_offline(cpu))
sparc64_yield(cpu);
tick_nohz_restart_sched_tick();
preempt_enable_no_resched();
...
schedule();
preempt_disable();
}
And on this particular CPU type sparc64_yield() is simply
touch_nmi_watchdog();
since this cpu doesn't support yielding.
So if we get that 5+ second qla2xxx interrupt storm during the
"while (!need_resched() ..." loop, no matter what we do the NMI
watchdog is going to trigger on us once the qla2xxx firmware
upload is complete.
X86 32-bit's cpu_idle() looks roughly like this:
while (1) {
tick_nohz_stop_sched_tick(1);
while (!need_resched()) {
check_pgt_cache();
rmb();
if (cpu_is_offline(cpu))
play_dead();
local_irq_disable();
/* Don't trace irqs off for idle */
stop_critical_timings();
pm_idle();
start_critical_timings();
}
tick_nohz_restart_sched_tick();
preempt_enable_no_resched();
schedule();
preempt_disable();
}
And similarly to sparc64, if that 5+ second qla2xxx interrupt
sequence happens after the tick_nohz_stop_sched_tick() call
we can run into the same situation.
Because the timer interrupt count is not incrementing, and it won't do
so for at least "5 * nmi_hz".
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/