Re: smp_call_function_single lockups
From: Linus Torvalds
Date: Tue Mar 31 2015 - 00:46:21 EST
On Mon, Mar 30, 2015 at 8:15 PM, Chris J Arges
<chris.j.arges@xxxxxxxxxxxxx> wrote:
> [ 13.613531] WARNING: CPU: 0 PID: 0 at ./arch/x86/include/asm/apic.h:444 apic_ack_edge+0x84/0x90()
> [ 13.613531] [<ffffffff8104d3f4>] apic_ack_edge+0x84/0x90
> [ 13.613531] [<ffffffff810cf8e7>] handle_edge_irq+0x57/0x120
> [ 13.613531] [<ffffffff81016aa2>] handle_irq+0x22/0x40
> [ 13.613531] [<ffffffff817a3b9f>] do_IRQ+0x4f/0x140
> [ 13.613531] [<ffffffff817a196d>] common_interrupt+0x6d/0x6d
> [ 13.613531] <EOI> [<ffffffff810def08>] ? hrtimer_start+0x18/0x20
> [ 13.613531] [<ffffffff8105a356>] ? native_safe_halt+0x6/0x10
> [ 13.613531] [<ffffffff810d5623>] ? rcu_eqs_enter+0xa3/0xb0
> [ 13.613531] [<ffffffff8101ecde>] default_idle+0x1e/0xc0
Hmm. I didn't notice that "hrtimer_start" was always there as a stale
entry on the stack when this happened.
That may well be immaterial - the CPU being idle means that the last
thing it did before going to sleep was likely that "start timer"
thing, but it's interesting even so.
Some issue with reprogramming the hrtimer as it is triggering, kind of
similar to the bootup case I saw where the keyboard init sequence
raises an interrupt that was already cleared by the time the interrupt
happened.
So maybe something like this happens:
- local timer is about to go off and raises the interrupt line
- in the meantime, we're reprogramming the timer into the future
- the CPU takes the interrupt, but now the timer has been
reprogammed, so the irq line is no longer active, and ISR is zero even
though we took the interrupt (which is why the new warning triggers)
- we're running the local timer interrupt (which happened due to the
*old* programmed value), but we do something wrong because when we
read the timer state, we see the *new* programmed value and so we
think that it's the new timer that triggered.
I dunno. I don't see why we'd lock up, but DaveJ's old lockup had
several signs that it seemed to be timer-related.
It would be interesting to see the actual irq number. Maybe this has
nothing what-so-ever to do with the hrtimer.
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/