Re: good 2.1.x SMP kernel is? [bad cpu?]

John Kennedy (jk@csuchico.edu)
Sat, 2 Jan 1999 16:33:19 -0800 (PST)


01/02/99 @ 04:14:07 PM (Saturday)

After more multi-OS testing, it looks like the "other" CPU
reboots for no good reason after long uptime.

Originally, I burned it in with a lot of half-life under win98.
I did some marathon sessions -- more than 12 hours on one weekend.
No problems, but win98 only uses one CPU. For the purposes of my
ASUS P2B-DS, I think that I'll call it the "primary" CPU and it
sits in the front slot (farther away from the rear-entry to the
PCI/ISA cards, etc).

When I started the slow process of trying to distrust and eliminate
everything, I just happened to pull out the "bad" CPU in the
secondary slot first -- it was just easier to get ahold of.
Did about a weeks worth of hard labor with no problems with the
"good" CPU.

I then rotated the CPUs (so the good CPU was in the secondary
slot and the bad CPU was in the primary). All the problems came
back, of course, and they were either worse because I was actively
trying to aggravate *something* or because the bad CPU was getting
worked a little bit harder in the primary slot.

So, for phase 3 of the CPU tests, I then pulled out the "good"
CPU and was trying to run the bad CPU in the now-known-good
primary CPU slot (worked fine for a week with the good CPU in it,
in any case). Reboots are still there!

Better yet, linux isn't the only OS. I had to play Quake2 for
about 6 hours (oh, twist my arm), but I got it to do the same thing
under win98. Just reboots for no good reason.

Now I've got the good CPU back in and I'm doing some retesting.
Bad CPU managed to do my full build-script (4-5 hours of hard work)
once without crashing out of 20+ attempts, so I probably only have
time for 1-2 tests today.

I've never seen a CPU fail that way. If I had, I would have
beaten on the individual CPUs a lot faster. I don't think it is
overheating -- it is a 450MHz, but it isn't overclocked, has a
working CPU fan, and I've added even more cooling to the case over
the period of testing. The twin CPU has no trouble in the same
environment, either.

Has anyone else out there seen a CPU fail this way before?

--- john

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/