HARDWARE HELL - POSSIBLE LINUX BUG AND/OR HARDWARE FAILURE ?

From: jjs@mail-tele.ccbc.cc.pa.us
Date: Tue Feb 15 2000 - 14:54:37 EST


First of all, I sincerely apolgise for being off topic here and I don't
want to bother the great work you guys are doing on the linux kernel and
system but this is a problem I been dealing with for 6 months or so and I
am desperate for a answer. Personally I would be happy if my computer
were to burst into flames, at least then I would know what I have to
replace. IF YOU ARE TOO BUSY TO HELP WITH A HARDWARE PROBLEM, PLEASE DON'T
WASTE YOUR TIME AND READ ON :) I KNOW YOU GUYS ARE BUSY

Oh I forgot to mention I fix pc's for a living and been using linux for
almost 4 yrs so I am not a newbie.

MY SYSTEM

-AMD K6-2 400 BOXED CPU (NEVER OVERCLOCKED)
-SOYO 5EHM MOTHERBOARD - VIA MVP3 CHIPSET, AT STYLE 1 MEG L2 CACHE
-2 32 MEG PC100 DIMMS
-WESTERN DIGITAL 10 GIG HD UDMA33, (ALSO HAD THE
 PROBLEM WITH A MAXTOR 5.7 UDMA33 WAS IN THERE INSTEAD)
-CREATIVE LABS TNT 16 MEG AGP 2D/3D CARD - NVIDIA TNT CHIPSET
-AUREAL VORTEX1 CHIPSET PCI SOUNDCARD (ALSO HAD THE PROBLEM WITH A
 ENSONIQ AUDIOPCI ES1370 SOUNDCARD IN THERE INSTEAD)
-NETGEAR FA310TX 10/100 PCI NETWORK CARD - PNIC TULIP CLONE CHIPSET
-NO CDROM CURRENTELY INSTALLED DUE TO FAILURE

If my wording looks weird or does not make sense it's because my nerves
are shot over this. Please feel free to ask me what something means

[Software]
WINDOWS 98
Slackware 7.0 BARE.I bootdisk glibc based distro and Slackware 4.0 rescue
rootdisk libc based distro. GNU SHRED from unstable fileutils series
statically compiled in a redhat 5.x or 6.0 glibc system.

     Okay I few months ago while doing a disk shred for various reasons
(no not porn or hacking darnit!) using GNU shred, a file/device shreder,
Linux just suddentely crashed for no apprentely reason. From then on,
win98 began to just freeze after sitting for a while or while doing
something intensive like a 3d game or demo. Linux off a bootdisk would
sometimes fail to uncompress the kernel due to crc errors or refuse to
load a ramdisk due to invalid size, both of which point to memory
problems. Windows 98 or Linux would just sometimes reboot by themselves in
the middle of stuff. Windows98 would sometimes bomb for no apprentely
reason. The system would sometimes even crash before it started loading a
os and would freeze in the middle of printing a bios message sentance.
Sometimes reseating the ram, playing with buss speeds, etc would seem to
make the problem go away but this was always temporary. Well after many
months of problems I finally got desperate and took the system compleately
apart, cleaned all the pci card contacts and dimm memory contacts with a
eraser and then scrubbed the pci cards, motherboard, cpu, and memory all
with a toothbrush and isopropyl alchol. From then on the system ran fine
with no problems. That lasted only a few months as we will soon see.

    About a month ago while doing a disk shred, the problem appeared again
and Linux just oop'ed in the middle of running GNU shred, sometimes, it
would just crash the program, many times the whole kernel would crash with
"unable to handle null pointer at virtual address, etc" or "stack dump
process ........" (correct me if I got the messages wrong). I have not
had windows98 back on the system since the crashes started again so I
don't know if the problem is back there also but I do know it's not a
power supply, a pci card or a hard drive bug because all those parts have
been swapped out before when the problem arose and the problem was still
there although sometimes it would go away for a while. Just a few days
ago now, Linux has also begun to blow major chunks just doing a dd
if=/dev/zero of=/dev/hda bs=1024k on my 10
gig hd with the process crashing or the whole kernel dying or rebooting.
I managed one time to get a weird error about LRU TABLE CORRUPT or
something

   After many runs to try to duplicate the problem I finally managed to
get enough consistant crashes to discover that when the system starts to
get into one of it's "moods" and crashes almost every time you boot linux
and run that problem or even just turn on the system and have it crash
into the middle of bios messages, ALL THE PROBLEMS STOP IF YOU DISABLE THE
EXTERNAL L2 CACHE. I figured I finally found the source of the problem, a
bad l2 cache but to be sure I played with the ram, first removing 1 of the
dimms and it still crashed, then moving the other dimm to that slot and it
was stable. Now the system is behaving with both dimms in but it has
pretended to be fixed before and started misbehaving very soon. I also
noticed the problem seems to go away for a while if the system if left off
for a while but nothing is overheating from what I can tell.

OKAY!!!! lemme finally end this horribly long message but I am desperate
for advice. I really suspect the motherboard is shot since I had the ram
tested and it was okay. I also forgot to mention that my k6-2 400 is a
boxed cpu which means while you get a 3yr warrenty, you also get a glued
on small heatsink and a cheap ball bearing fan that even with ball
bearings only lasted me 1.5 months. Since I did not want to wait for a
warrenty replacement and I did not want to have to use a fan of the same
crap size and design, I attempted to pry off the glued on heatsink so I
could use a better heatsink fan combo, and I managed to chip ceremanic off
the corner of the cpu. There were no exposed wires or silicon and after
I replaced the fan the systems worked fine for months after that. I am
very tempted to just replace the motherboard and see if this stops the
problems I am having but I wanted to make sure that it's not another
problem that is giving the same symptoms of a bad motherboard. Thanks for
any help and God Bless.

P.S. if this letter gets me booted off the mailing list or is ignored, I
understand completely but this was my last ditch cry for help before I
give up and jump off a bridge or save up for a athlon.

Jason Salopek

     
     

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Feb 15 2000 - 21:00:30 EST