Re: :-/

Linus Torvalds (torvalds@cs.helsinki.fi)
Tue, 17 Sep 1996 23:06:13 +0300 (EET DST)


On Tue, 17 Sep 1996, Boris Tobotras wrote:
>
> Hi, it's me again. Again, during ip-up. And, I found it's cause :)
> OOPS comes when some process tries to access /proc/97/stat. 97 is PID of
> diald-0.14 being running now. (Works just fine, BTW :) This is how "cat stat"
> looks like:

You have some _serious_ memory corruptions somewhere. This latest panic is
due to "tsk->sig" being a bogus value (0x43434700 - looks like the string
"\0GCC" but that's rather strange too). That's why it panics when it tries
to "cat /proc/x/stat" - the process info is corrupted.

Your other panics have looked like the bitmap for the free list handling is
corrupted, or possibly just the free list pointers within mem_map[] are
totally bad for some reason. It looks like _really_ major memory corruption,
but I can't see anything in your setup that could cause it. I can't even
blame it on any strange support, as your very minimal kernel also acted very
strangely.

Quite frankly, it _feels_ like some serious problem with the hardware, but
you say that the machine is generally stable. A hardware problem that
exhibits this kind of major corruption would be likely to bring the machine
down rather quickly, rather than wait for you to run crashme ;-)

I could imagine a missing TLB invalidate resulting in problems like these,
with user-level programs subtly modifying pages that we have already free'd
for other use. I've had those kinds of problems before, but I'd expect those
kinds of problems to hit a lot more people (and they shouldn't be repeatable
at all).

What kernels have worked on this machine? Is 1.2.13 rock solid on it?

Linus