fd0 causes Kernel PANIC (a un-resolved long tale of woe)

Dan St.Andre' (grillon@m3.interserv.com)
Sat, 12 Apr 1997 10:47:02 -0500

I'm a newbie but here is what's happening to me. **I** think it is a
hardware problem, but "... we don't have this problem on DOS ..." is causing
Linux to get a bad name.

Are there any known problems (a)kernel modules (b)diskette drivers
(c)file system modules that you know of? Are they fixed in any release or rpm?

Are you aware of any diskette or diskette controller hardware
configurations, (or other hardware for that matter) that causes or
contributes to these problems?

How might I instrument this so that (1) **I** can get a handle on what
is really happening -- hardware vs. software, and (2) I can collect data for
others to use to resolve any real software issues.

I'm still using the RH 3.0.3 distribution. I cannot do RH4x yet because I
do not have time to sort out all the PAM, CHAP, PAP whatever that I see on
the lists.

The server is a commercial pentium tower built by Intel.
The clients are PC's with an industrial form factor called PC-104. They are
really PC's in every way except mechanically.

The Situation:
intermittent KERNEL PANIC as a result of troubles with the diskette device
[major number 002].
1) our server never has a hickup
2) our clients fail routinely
3) Yesterday, Friday 11 April 1997, we could hardly do anything at all.

What we Do:
We are using diskettes for sneaker net by typing
mount /a; cp $FILES /a; umount /a
mount /msdos; cp $FILES /msdos; umount /msdos
[Of course the copy might go the other way.]

The mount points are in /etc/fstab as follows:

/dev/fd0 /a ext2 user,noauto 0 0
/dev/fd0 /msdos msdos user,noauto 0 0

What we Tried:
1) we modified /etc/rc.d/init.d/syslog to make klogd more verbose
daemon klogd -c 7
2) we modified /etc/fstab to be more defensive
/dev/fd0 /aC ext2 user,noauto,errors=continue,check=strict 0 0
/dev/fd0 /msdosC msdos user,noauto,errors=continue,check=strict 0 0

1) diskettes written on the server with /aC mount point could not be read
on the clients without a flood of I/O complaints
2) client mount point does not seem to matter
3) things work fine for a while
4) once trouble start, we must cycle system power

During mount:
floppy0: probe failed...
floppy0: probe failed...
... auto retry
floppy0: probe failed...
floppy0: probe failed...
... success

During mount:
... requesting process appears to stall waiting for mount command completion
... diskette light is on solid
... requesting process is in D-wait
manual eject the diskette
... sometimes mount completes reporting "wrong fs-type..." error
... sometimes we get Kernel PANIC

During copy:
I/O error...
... sometimes the copy retries and succeeds
... sometimes the copy retries and fails
... sometimes we get a Kernel PANIC
