Re: 2.6.34-rc4 : OOPS in unmap_vma

From: Borislav Petkov
Date: Wed Apr 14 2010 - 11:22:42 EST


From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date: Wed, Apr 14, 2010 at 07:32:08AM -0700

Hi Linus,

> On Wed, 14 Apr 2010, Borislav Petkov wrote:
> >
> > hmm, it doesn't look like it. Your code translates to something like
> >
> > 0: b8 00 00 00 00 mov $0x0,%eax
> > 5: 80 ff ff cmp $0xff,%bh
> > 8: ff 48 21 decl 0x21(%rax)
> > b: 45 80 48 8b 45 rex.RB orb $0x45,-0x75(%r8)
> > 10: 80 48 ff c8 orb $0xc8,-0x1(%rax)
>
> There's a large constant (0xffffff8000000000) in there at the beginning,
> and the disassembly hasn't found the start of the next instruction very
> cleanly. The same is true at the end: another large constant is cut off in
> the middle.
>
> The byte just before the dumped instruction stream is almost certainly
> '48h', and the last byte of the last constant is 0xff, and the disassembly
> ends up being:
>
> 0: 48 b8 00 00 00 00 80 mov $0xffffff8000000000,%rax
> 7: ff ff ff
> a: 48 21 45 80 and %rax,-0x80(%rbp)
> e: 48 8b 45 80 mov -0x80(%rbp),%rax
> 12: 48 ff c8 dec %rax
> 15: 48 3b 85 40 ff ff ff cmp -0xc0(%rbp),%rax
> 1c: 48 8b 85 50 ff ff ff mov -0xb0(%rbp),%rax
> 23: 48 0f 42 7d 80 cmovb -0x80(%rbp),%rdi
> 28: 48 89 7d 80 mov %rdi,-0x80(%rbp)
> 2c:* 48 8b 38 mov (%rax),%rdi <-- trapping instruction
> 2f: 48 85 ff test %rdi,%rdi
> 32: 0f 84 f5 04 00 00 je 0x52d
> 38: 48 b8 fb 0f 00 00 00 mov $0xffffc00000000ffb,%rax
> 3f: c0 ff ff
>
> But yes, you found the right spot (that 0xffffff8000000000 constant is
> -549755813888 decimal):

Right, the decodecode output looked kinda strange to me and I tried
to match the instruction order and find the location. But yeah, now
that I'm looking at show_registers(), we don't start dumping on precise
instruction boundary but simply 64 bytes in the default case. No time
for an instruction decoder along that path :).

> > which I could correlate with what I get here (comments added):
>
> Yup. Close enough. Btw, it's often good to look at both the *.s code _and_
> the *.lst code. If you do "make mm/memory.lst", you'll find those big
> constants easily, and then you'll see the code this way:

[..]

ok, I can't say that I'm a linux newbie but the .lst code is new to me.
Damn, and I thought I knew it all :)

> > so it looks like it tries to find a page table rooted at that address
> > but the pointer value of 0000000000002203 is bogus.
>
> Yes, it does look like some strange page table corruption, doesn't look
> anon_vma related at all. It's intriguing that it started happening now,
> though, so..

Well, Parag said something about kexec kernel so it is definitely
interesting what he means there - a kexec-enabled kernel or is this the
"second" kernel his machine kexec'd into after a previous failure. I
think this could clarify the situation a bit.

Thanks for looking over the asm.

--
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/