Re: 2.2.10 oops (finally, something I can report!)

Nate Eldredge (nate@cartsys.com)
Wed, 30 Jun 1999 21:39:20 -0700


Linus wrote:
>
> In article <Pine.LNX.4.05.9906300304020.7161-100000@vitelus.com>,
> Aaron Lehmann <aaronl@vitelus.com> wrote:
> >This time I had the fortune of an Oops that didnt lock up the machine. I'm
> >going to apply KMSGDUMP so I can send all future oopses also.
> >
> >I hope this helps fix the stability problems:
> >
> >Reading Oops report from the terminal
>
> Interesting.
>
> The oops looks fine. The symbolic information also looks fine: the code
> in question does in fact look like it is the second instruction in
> "inet_sendmsg()". Everything basically seems to say that the oops is
> correctly decoded and caught.
>
> The thing that does NOT make sense is the cause of the oops itself,
> though.
>
> The oops happens on
>
> c017b651 pushl %ebx
>
> and %esp = c3941e80.
>
> And quite frankly, there's not a way in h*ll that that instruction could
> raise the exception in question. But it does.
>
> I would _strongly_ suspect one of two things:
> - bad CPU.
> - bad cache or RAM timings.

I had a Cyrix CPU some time back that had a *very* similar problem. I
believe it was running 2.0.36. Anyway, it worked absolutely fine, until
one day I built EGCS. This binary would, about 1/3 of the time, crash.
Poking around with a debugger showed that the instruction on which it
crashed was an access to a perfectly valid address (according to
/proc/xx/maps). Swapping in a different CPU (I think it was an Intel
Pentium) fixed it. ISTR it also could be fixed by turning off the L1
cache or something equally unacceptable performance-wise.

Here is its /proc/cpuinfo. It was a 110 MHz (aka "133+") 6x86.

processor : 0
cpu : 586
model : 6x86
vendor_id : CyrixInstead
stepping : 2.5, core/bus clock ratio: 2x
fdiv_bug : no
hlt_bug : no
f00f_bug : no
fpu : yes
fpu_exception : yes
cpuid : yes
wp : yes
flags : fpu
bogomips : 110.18

In short, I think the CPU may well be a reasonable suspect. I'd suggest
replacing it, if you can beg, borrow, or steal another (a different
brand might be wise), and see if there is any change.

HTH,

-- 

Nate Eldredge nate@cartsys.com

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/