Re: Bug in on_each_cpu?
From: Zachary Amsden
Date: Thu Mar 01 2007 - 06:34:49 EST
Andrew Morton wrote:
On Thu, 01 Mar 2007 02:34:02 -0800 Zachary Amsden <zach@xxxxxxxxxx> wrote:
I think on_each_cpu has a serious bug now that hrtimers has been
integrated. Basically, there are many callsites of clock_was_set, none
of which actually know the current interrupt state - enabled or
disabled. So they save and restore irqs, as they should. But
on_each_cpu does not do so, creating bugs in any function which calls
these functions with interrupts disabled.
on_each_cpu() calls smp_call_function(). It is a bug to call
smp_call_function() with local interrupts disabled, hence it is a bug to
call on_each_cpu() with local interrupts disabled.
There is a WARN_ON() in smp_call_function() to catch this.
Ok, I thought that might be the case. We probably infinite looped in
the interrupt handler because of bad state and thus missed the WARN_ON
output.
Why is it a bug? Because there's a deadlock where this CPU is waiting for
CPU A to take the IPI, but CPU A is waiting (with interrupts disabled) for
this CPU to take an IPI.
Then the bug is not in on_each_cpu(). It is in the usage of
clock_was_set(). For example, look at do_settimeofday in kernel/timer.c:
write_sequnlock_irqrestore(&xtime_lock, flags);
/* signal hrtimers about time change */
clock_was_set();
return 0;
And timekeeping_resume has similar code (and called from a sysdev
callback, so I don't know what the interrupt state should be ). Either
the write_sequnlock_irqrestore is redundant, and should be merely an
write_sequnlock_irq, or the callsite is not prepared to handle enabling
interrupts temporarily as must be done for on_each_cpu(), which is a
pretty scary scenario.
What would be really, really nice would be to statically check all
callsites that issue irq disables actually keep irqs disabled.
Presumably, there was a reason they disabled irqs, and re-enabling them
underneath their noses, even if it is to avoid a race, breaks the logic
behind that reason.
Basically, it is just always a bug to re-enable interrupts from a caller
which has disabled them. Either there is no reason they needed to
disable them in the first place, and it is just a performance issue, or
they are updating some shared state used by the interrupt handler, which
can then operate incorrectly, or they were taking some lock to prevent
this, and the interrupt handler can then attempt to take a nested lock
and hang forever.
In our case, the bug is easily worked around - we don't actually need to
set the wallclock time here. But there is a deeper fundamental issue
which I think needs to be addressed.
Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/