The system booted with 2.0.20 and everything looked OK until I tried
to do a compile. As soon as gcc started I got eight identical messages:
Whoops: unlock_buffer: async io complete on unlocked page
The process seemed to hang, but control-c got me back to a bash prompt.
I hit control-alt-delete, and when it got to the point that it normally
unmounts the file systems it instead complained that they were busy.
Unfortunately when I booted back to my old 2.0.0 kernel, which had never before
given me any trouble, it ran into problems with fsck'ing my user partition.
Not the usual fsck problems one might expect from a corrupted file system
(although it probably is), but rather a huge mess of nasty messages from the
kernel or maybe the SCSI driver with lots of numbers and complaints about
interrupt problems and such.
I guess I'll try booting an even older kernel in single user mode and see
if anything can be salvaged.
I'm really beginning to wonder about the 53c7,8xx driver; I've had problems
with it on Intel boxes also. For instance, if I dd a CD-ROM image from
/dev/scd0 into a disk file, the system is often left in an unusable state: if
I type 'ls' or '/bin/ls' the shell says that ls can't be found, even though if
I type 'echo /bin/*' ls is definitely there. I wonder if it has somehow
managed to corrupt the kernel's buffer cache. I've also noticed that the
resulting CD-ROM image doesn't always match the contents of the disc.
I also find it surprising that dd'ing the CD-ROM to disk on an otherwise
idle machine pegs the load average at 1. I'd have thought that the process
would be in an I/O wait most of the time. Perhaps I just don't understand
how the Linux load average is computed.
At first I thought this might be problems with the kernel or the SCSI CD-ROM
driver, but I don't see these problems when I use an IDE CD-ROM on my system
at work.
More recently I restored a tar backup of a DOS partition from a DAT tape
onto a newly formatted DOS partition. That caused similar problems and
also resulted in a DOS partition so corrupted that the Windows 95 scandisk
program ran out of memory trying to fix it (on a 48 MB system!)
I've had these sorts of problems with various kernel versions from 1.2.11
through 2.0.0 (and now 2.0.20), and on both Alpha and Intel systems. The
only thing in common on all instances was heavy use of the 53c7,8xx driver.
Has anyone else had these sort of problems with the 53c7,8xx driver (or
with other drivers)? Maybe I should try the 538xx driver instead.
Or maybe I should just buy a different SCSI controller. Any suggestions
as to whether I am better off with a BusLogic BT-946C or a QLogic
Fast!SCSI PCI Basic?
Cheers,
Eric