Re: [xen] double fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC

From: Linus Torvalds
Date: Mon Oct 07 2013 - 18:14:53 EST


On Mon, Oct 7, 2013 at 1:35 AM, Fengguang Wu <fengguang.wu@xxxxxxxxx> wrote:
> On Mon, Oct 07, 2013 at 01:12:17AM -0700, Linus Torvalds wrote:
>
> My pleasure! Here are 100 randomly selected call traces. Also attached
> several full dmesgs and the kconfig.

Ok, they may be randomly selected, but they are all the same. Which is
good, I guess, we're only talking about one bug.

Anyway, they all have RIP:run_timer_softirq+0x12c/0x1b8, and the code is

0: 8b 65 c8 mov -0x38(%rbp),%esp
3: 4d 39 ec cmp %r13,%r12
6: 0f 84 2f ff ff ff je 0xffffffffffffff3b
c: 41 8b 4c 24 18 mov 0x18(%r12),%ecx
11: 4d 8b 74 24 20 mov 0x20(%r12),%r14
16: 4d 8b 7c 24 28 mov 0x28(%r12),%r15
1b: 4c 89 63 38 mov %r12,0x38(%rbx)
1f: 49 8b 44 24 08 mov 0x8(%r12),%rax
24: 49 8b 14 24 mov (%r12),%rdx
28: 83 e1 02 and $0x2,%ecx
2b:* 48 89 42 08 mov %rax,0x8(%rdx) <-- trapping instruction
2f: 48 89 10 mov %rdx,(%rax)
32: 48 b8 00 02 20 00 00 movabs $0xdead000000200200,%rax

where that constant is LIST_POISON2 and the "and $2" seems to be
TIMER_IRQSAFE. So the trapping instruction *looks* like it's doing
__list_del() on the timer, and timer->next is NULL.

So somebody added a timer, and then deallocated/cleared the structure
before it triggered. The problem is, I can't see a way to figure out
_who_ did that.

I *think* r14 contains the function we're going to jump to in the
oops, and that could be interesting to know, but it's not decoded, so
you'd have to match it up against a symbol map...

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/