Re: SCSI Sector Size Problem

Theodore Y. Ts'o (tytso@mit.edu)
Fri, 1 Nov 1996 23:27:25 -0500


From: "Nicholas J. Leon" <nicholas@binary9.net>
Date: Fri, 1 Nov 1996 20:24:51 -0500 (EST)

Oops! The second part of my argument (which I apparently forgot) was
that I then remade the filesystem (to compare the bad sectors,
*MOSTLY* the same) and then

cat /dev/zero > /mnt/foo

and it also generated the same errors??

Did you use the -c option to mke2fs? And did you check (using dumpe2fs)
to see which blocks it marked as bad? And when you say "the same
errors", were the sector numbers really **identical**?

# > hdb: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
# > hdb: read_intr: error=0x40 { UncorrectableError }, LBAsect=306946, sector=306945
# > end_request: I/O error, dev 03:41, sector 306945

Keep in mind that once an IDE disk starts going bad, the number of bad
sectors usually grows exponentially, so blocks which *were* good only a
few seconds ago may not longer be good now. Basically, what's going on
is that after a the head does a "touch and go" on the disk surface, this
kicks lose filings which start ping-poning around the disk, generally
causing more damage, and when some of these fragments hit the disk
heads, they can cause the heads to crash into the disk again, and cause
even more damage....

I usually suggest to users that as soon as they start seeing those
errors, they should plan on replacing their disk *soon*, before they
lose all of their data. While it's true that there's some work we can
do to try to make Linux more robust in the face of disk errors, the real
problem is that PC hardware is so cheasy that once it starts failing, a
catastrophic disk failure is usually not far behind.

- Ted