Something wrong in 2.2.8?????

Michael H. Warfield (mhw@wittsend.com)
Wed, 12 May 1999 18:10:27 -0400 (EDT)


All...

Just had some disturbing problems after switching to 2.2.8.

I was only switched over for a short period of time and was working
on some kernel drivers (Computone serial card drivers - not in the kernel
I was running at the time of the problems). In one window I had my mail
up in elm. In the other window, I went to unpack the kernel sources to
test out a patch. I started a resync on my mailbox about the same time
I started to untar the 2.2.8 for a fresh set of sources. Both got about
half way through and died. At that point, I could not kill any of my
xterm windows although I could run commands like top. Top showed system
load over 16 and climbing! Could not shut down. Sync would hang. Could
switch to a virtual console but Cntrl-Alt-Delete failed to do anything.
Cntr-Alt-Backspace out of X Windows just left a grey screen and then couldn't
switch to another virtual console. Had to reboot from the hardware switch...

Ok... Rebooted system and came back up on 2.2.8. Fsck reported
all kinds of problems in the kernel source directory I was untaring. That
figured and wasn't real surprising.

Once the system back was up, I logged back into X Windows and brought
my usual array of xterms back up. I then performed the same actions that
I did before. I deleted the old kernel source tree, remade the directory,
restarted the untar, and then deliberately resynced my mailbox. Bang...
Dead again! Sync still hangs, but I was able to get to another virtual
console. I executed an init 6 and this time it shut down until it was
trying to unmount the local file systems. It died then and there. Could
not even get the "Num Lock" light on the keyboard to toggle. Grrr...

Hit the reset switch and brought it back up. This time back on
2.2.7 which had been running perfect for me since it first came out (I
patched up from the E-Mail announcement). Fsck found some serious bad
juju on one of the SCSI drives (I use a mix of IDE and SCSI). Had to
fsck the drive manually. One inode was chock full of illegal blocks.

Once again got the system back up and got back into X Windows.
Repeated same fatal steps, this time on 2.2.7. No problem. Did it twice.
No problem. I'm about to go back to 2.2.8 and reconfirm, but something
does not seem right here. I know there's been a lot of work down in the
scheduler, but this seems to be a file system handler problem. At least
it seems to be hanging on intense file system activity.

Lot's of user level programs, such as ps, df, and top can still
run so I can get status on some things. It doesn't die until it seems
to need to write to the disk. (That's just my HIGHLY SUBJECTIVE opinion
and may be way off base.)

Notes:

System: 350MHz K6 Asus P5A-B Mother Board
Kernel: 2.2.8
Dstrib: RedHet 5.2
Memory: 128Meg
Swap: 128Meg
Drives: Mix of 2 Gig IDE and SCSI drives.
SCSI: aic7xxx

Note: /var file system (and subsequently the /var/spool/mail spool
files) is on an IDE drive. My source directories are on one of the SCSI
drives. Both would have been busier than one-armed paper hangers at the
time of the crash.

Ok... Once is chance, twice is co-incidence, three times is enemy
action. I'm about to switch back over to 2.2.8 and go for three. Any
suggestions from anyone as to what else I might look out for or what I
might do to determine what blew its brains out????

If it still dies, I'll follow up but then drop back to 2.2.7. I've
got work to get done... Sigh...

Mike

-- 
 Michael H. Warfield    |  (770) 985-6132   |  mhw@WittsEnd.com
  (The Mad Wizard)      |  (770) 925-8248   |  http://www.wittsend.com/mhw/
  NIC whois:  MHW9      |  An optimist believes we live in the best of all
 PGP Key: 0xDF1DD471    |  possible worlds.  A pessimist is sure of it!

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/