Re: [vma list corruption] Re: proc_pid_readlink oopses again on 2.6.14.5

From: Joshua Kwan
Date: Wed Dec 28 2005 - 02:24:39 EST


Al Viro wrote:
> Until the last line it made sense. Code, however, is flat-out BS.
> This chunk is from around proc_exe_link(), all right. But it starts
> at 3 bytes before the beginning of that function. Perfect match to
> build with your .config using gcc4, but... no way in hell you would
> get an oops at that location - it's in the middle of long chunk of
> NOP. So something's rotten here...

Do you think it might be a subtle compiler problem, and if I compiled it
with GCC 3.3 it might go away?

I'm willing to help diagnose this problem, but this is a production box
I'm messing with, and I don't want to reboot it more than a few times,
so I want to make those tries count with advice from folks like you :)

What do you think about the oopses in my previous post?
http://www.ussg.iu.edu/hypermail/linux/kernel/0512.0/0199.html
These were triggered (well, I'm not sure how the first one came about)
by running 'pidof pppd' - again in /proc/*/ walking procedures.

> So you've got 0xb7c1fc20 as vma. Which is not good, since that's a userland
> address. The next question is where it'd come from - it might be
> * fscked task->mm
> * fscked mm->mmap
> * fscked vma somewhere in the chain.

Note that 2.6.12 is running peachy on the machine right now, so it's not
a hardware problem.

> Doing lsof will walk vma chains of many processes, so if something is
> corrupted it will step into that...

Understood. In this particular case, it seems to have been apache2's
process (3399)..

Thanks for your diagnosis thus far.

--
Joshua Kwan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/