Re: 2.6.31-rc1 crashes randomly on my Machine.
From: Al Viro
Date: Sat Jun 27 2009 - 12:43:10 EST
On Sat, Jun 27, 2009 at 01:10:46PM +0200, Zeno Davatz wrote:
> > which is at least not entirely implausible. ?So it seems to be a memory
> > corruption in .text, which might or might not affect the directly
> > preceding bytes (0xe9 <signed 32bit int> is a relative jump, so there's
> > no way to tell whether this 0xff had been the only byte affected - it
> > would be preceded by 3 0xff coming from small negative integer anyway).
>
> I just done another pull from the Git repository of Linus and booted
> from the latest 2.6.31-rc1 and my Machine still hangs after boot up,
> with the following message at the end in /var/log/messages
>
> Jun 27 03:01:52 zenogentoo Stack:
> Jun 27 03:01:52 zenogentoo c10d14f2 f2eb9f5c c10ab407 00000400
> b8033000 f6b43d80 f33cbe28 00000000
> Jun 27 03:01:52 zenogentoo <0> 00000000 f65c9000 00001000 00000000
> 00000000 00000000 f721a100 fffffffb
> Jun 27 03:01:52 zenogentoo <0> c10d13e5 f2eb9f64 c10f1522 f2eb9f98
> 00000400 b8033000 f6b43d80 f6b43d80
> Jun 27 03:01:52 zenogentoo Call Trace:
> Jun 27 03:01:52 zenogentoo [<c10d14f2>] ? seq_read+0x10d/0x3a5
> Jun 27 03:01:52 zenogentoo [<c10ab407>] ? mmap_region+0x1bf/0x41a
> Jun 27 03:01:52 zenogentoo [<c10d13e5>] ? seq_read+0x0/0x3a5
> Jun 27 03:01:52 zenogentoo [<c10f1522>] ? proc_reg_read+0x57/0x78
> Jun 27 03:01:52 zenogentoo [<c10bb257>] ? vfs_read+0x8b/0x141
> Jun 27 03:01:52 zenogentoo [<c10f14cb>] ? proc_reg_read+0x0/0x78
> Jun 27 03:01:52 zenogentoo [<c10bb3b6>] ? sys_read+0x3d/0x6b
> Jun 27 03:01:52 zenogentoo [<c1021290>] ? sysenter_do_call+0x12/0x2c
> Jun 27 03:01:52 zenogentoo Code: 0b fc f6 50 0b fc f6 01 00 00 00 00
> 00 00 00 60 0b fc f6 60 0b fc f6 00 00 00 00 00 00 00 00 00 00 00 00
> 08 00 00 00 00 00 00 00 <ff> ff ff ff ff ff ff ff 00 00 00 00 00 00 00
> 00 00 00 00 00 00
> Jun 27 03:01:52 zenogentoo EIP: [<f6fc0b7c>] 0xf6fc0b7c SS:ESP 0068:f2eb9efc
> Jun 27 03:01:52 zenogentoo ---[ end trace 1b3422263ead727b ]---
Jumped to nowhere. For one thing, 0xf6fc0b7c is nowhere near the addresses
where the kernel code would live. For another, the contents of memory at
that address doesn't look code (a lot of 0, a lot of 0xff *and* several
32bit values that look like addresses nearby (0xf6fc0b50, 0xf6fc0b60).
IOW, some data structures; hell knows what it might have been, but we
have seq_read() seeing m->op->start that points somewhere strange.
Again, memory corruption of some kind. We have file->private_data that
might have been screwed up, or it might have been right pointer, but
the struct seq_file it points had been overwritten with some crap, or
that might have happened to the methods table ->op of that seq_file points
to...
Having looked at what seq_read() has compiled to in your kernel... what's
the value of ECX in that oops? That's where m->op ends up and a look at
that sucker might at least narrow it down.
Said that, at this point I'd
* run memtest, just to exclude the hardware crapping itself; nastier
coincidences happened
* try bisecting, if oopsen are easy to trigger.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/