HELP!! Serious stability problems with Linux! Where to go from here?

Jeremy Domingue (jer@hughes.net)
Sat, 27 Jun 1998 17:42:53 -0700


Hey all.... and HELP!

Well, this mailing list is basically a last resort of mine, because I have
nowhere else to go. I am totally frustrated with Linux at the moment.

Let me start at the beginning. I know this is going to be long, but please
bare with me here :).

A few months ago, I decided to put a second processor on my server. Great, I
thought, this server is going to completely ROCK on Linux.....not so.

At that time I was running 2.0.33, and the server was crashing anywhere from
30 mintues to 3 days after the server was booted. The messages that I was
getting were:

Aiee: scheduling in interrupt 0012c39d (a whole bunch of these)
Kernel panic: FORWARDED INTERRUPT TIMEOUT (AKP=1, Saved AKP=0)

So, after something like that, I came to this list, and was told to install
2.0.34pre11b, that it had fixes in it for this problem. Great I said, no
problem. I proceeded to install 2.0.34pre11b on the server, and didn't have
a hitch. So, at that point, I thought life was great and everything was
fine.

A few days after that ordeal, I had to relocate the server about 45 miles
from me and try to run it remotely. About a week after everything was moved
over, the server began crashing again about every 3 days. It is a heavily
used machine, and the system load did spike quite a bit, so I just figured
it needed another upgrade (wasn't able to look at the console at that time).
It took me awhile to gather the cash, so for the past 2 months or so I have
just been rebooting the server every time it crashed. I finally got the the
memory upgraded to a total of 512MB recently, and upgraded to the final
2.0.34. Once again, I thought, everything was going to be great. No
change..... crash, crash, crash, crash, crash, crash, crash.

So... next idea? Bad hardware maybe? I proceeded to get a new motherboard
and all new memory from Gateway (I was surprised they were even willing to
trade me my old hardware with new without me having any sort of proof the
old was bad). Still.... crashes, and more crashes.

I finally got to the console of the server shortly after one of my crashes,
and what did I find but the same "Aiee: scheduling interrupt" and "Kernel
panic: FORWARDED INTERRUPT TIMEOUT" messages I thought I had taken care of
so long ago.

Okay, so, I figured, there were newer kernels in the 2.1 line, maybe if I
tried that I could get past this bug. So, yesterday, I went to install
2.1.107 (and yes, I did update all of the Linux components to at least the
required versions outlined in the Changes file in 2.1.107). Again, I have
hit a roadblock with Linux.

First I tried a module install. I am running SCSI drives, so of course I am
going to need the SCSI module, and run a mkinitrd so it can load at bootup.
Every time I try to use mkinitrd I get:

mount: the kernel does not recognize /dev/loop0 as a block device
(maybe `insmod driver'?)
Can't get a loopback device

Okay... so.... let's try to build the driver into the kernel, can't miss on
that one.

Everything went fine until I rebooted. After it detected the SCSI card, it
did a partition check and detected my hard drives as 6MB each, and the wrong
sectors count! I have 2 4.1 GB hard drives, so as soon as I saw that fly by
I knew this wouldn't be good. After the partition checks, it went to mount
root and I got an "unknown partition" error, and the boot halted.

So, here I am today, back on the ever crashing 2.0.34 pleading for help to
get my server to stay up like a real server is supposed to. I have no idea
where to go from here with Linux. It is also very sad that my NT server
stays up for 60 days, and I am lucky if my Linux server stays up 60 hours.

If anyone out there can give me any type of help with this stability
problem, I would be deeply idebted and extemely greatful. At this point I
would do almost anything if Linux would just work like I know it can.

And, thanks for reading this far :)

Jeremy Domingue
jer@hughes.net

SERVER CONFIGURATION:

Gateway 2000 NS-7000 266
Dual Pentium II 266mhz
512MB ECC EDO SDRAM
Adaptec 7880 on-board SCSI controller
2 - 4.1 IBM Hard Drives
Plextor 12x SCSI CD-ROM Drive
RedHat 5.0 / Kernel 2.0.34

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu