Re: Drive error

Theodore Y. Ts'o (tytso@MIT.EDU)
Mon, 13 Apr 1998 20:32:06 -0400


From: Mark Shapiro <mark@award.bios.net>
Date: Sun, 12 Apr 1998 23:39:04 -0400 (EDT)

hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=1988685, sector=17036
end_request: I/O error, dev 03:03, sector 17036

Is this a problem with the hard drive or the controller? Should I replace the
bad copmonent, or is it fixable? It looks like a hardware error rather than a
filesystem error, sooo....

These are low-level errors from the disk driver, reporting errors from
the disk controller. What they mean is that you have hardware errors
developing on your drive. When this happens, *immediately* perform a
backup!! If you have a spare (empty) disk handy, do the backup by using
dd to copy the raw disk image.

(Of course, you've been religiously doing backups all the time anyway,
right? :-)

Unfortunately, due to the way IDE disk technology fails, when you start
seeing these sorts of errors, in many cases they are a prelude to
massive disk failure. The disk head may have done a "micro-crash", and
skipped across the platter on those places, perhaps caused by some which
dirt managed to get past the filters/seals on the disk, or by some other
deteriorations, or perhaps just simply due to someone jostling the drive
at the wrong time. Often, the debris left over from those events will
cascade through the platters, causing even more head crashes, which
raises more debris, and so on, and so on...

If this is the case, you will see an exponentially increasing number of
disk block failures. It will be slow at first, but don't let that lure
you into a false sense of security. In some cases, it may even stop for
a while; but eventually the debris will get dislodged and cause more
damage (especially true on laptop drives, in my experience).

If you use the "badblocks" program, it will undoubtedly show you the bad
blocks, and you can feed them to e2fsck in an attempt to map out those
bad blocks, to prevent the kernel from trying to use those blocks ---
however, do a disk backup first!! There is a fairly good chance that
this may be a pre-warning signal to massive and complete disk failure.
Your disk may only have a limited number of "reads" left on it, and you
shouldn't waste it on using the badblocks program --- backup your data
first, and only then start worrying about using programs like badblocks.

In fact, I will often treat the appearance of these failures (especially
on a disk that's more than 2 or 3 years old) as a "Timmy, Lassie's
trying to tell us something" and assume that the system is telling me
that it's time to replace the hard disk. IDE disks are relatively cheap
these days, and by the time a disk has had 3-5 years of hard life, it's
about exhausted its potential lifetime anyway.

- Ted

P.S. Besides, the capacity/dollar of drives has been doubling every
12-18 months --- isn't it time you rewarded yourself with a new drive? :-)

P.P.S. If someone maintaining the Linux FAQ or HOWTO's would like to
include this text in one of the HOWTO's, feel free. You have my
permission to reproduce this as you see fit.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu