On Wed, 5 Mar 2014, Rik van Riel wrote:
There appears to be a deadlock in the hrtimer code. Specifically,
clock_was_set() calls an IPI with wait=1, from softirq context.
This should not be called from softirq context.
Waiting for IPIs to complete in irq context can lead to a deadlock,
because the current code (that was interrupted) might be holding some
kind of lock, that another CPU is waiting for with spin_lock_irq or
similar.
In other words, the current CPU may need to release a resource, before
the IPI can be handled by one of the destination CPUs.
To my untrained eye, it does not look like this patch introduces a
new bug to the timer code, but that is hard to ascertain with the
timer code. so I am posting this as an RFC for the timer gods to hurt
their brains on :)
This bug was introduced by 54cdfdb4 in early 2007 (the original
hrtimer code patch).
Right and we had some issues with that until we moved the calls to
clock_was_set() out of lock held regions.
The only call which happens from interrupt context is in
update_wall_time(). And that one definitely holds no locks which are
relevant.
On which kernel are you observing the issue?