Re: data from kernel.bkbits.net

From: Theodore Ts'o
Date: Mon Nov 24 2003 - 19:31:41 EST


On Mon, Nov 24, 2003 at 02:24:13PM -0800, Larry McVoy wrote:
> > Yes but an attempt to read beyond the limits of the physical
> > drive will provide you with a lot of **interesting** hardware
> > errors. This happens if the file-system gets corrupt.

Sure, but not that those kinds of errors. You'll see errors like this
instead:

kernel: attempt to access beyond end of device
kernel: 08:05: rw=0, want=198500353, limit=5779456
kernel: attempt to access beyond end of device
kernel: 08:05: rw=0, want=4294934529, limit=5779456

ATA device timeouts, which is what Larry reported, are not caused by
attempting to read beyond the limits of the physical device.

> Yeah, I think Richard may be right. Anyway, the drive sort of reads
> from the raw partition. It gets a IDE reset and then it reads. I can
> read it a second time with no reset. Haven't tried a reboot between
> reads, hang on, yeah, a reboot brings the errors back.

It really, really sounds like the disk is pooched. I don't know if it
was bad luck, cooincidence, or the fact that it was powered down for a
while. But I'm guessing that it's taking a long time for disk to read
a sector, which is causing the disk driver to timeout and reset the
bus, but then the sector is first cached in the IDE disk cache (where
it can be read quickly) and then it ends up getting cached in the
system memory. That would explain why a reboot brings the errors backed.

> But, fscking the dd-ed image gets me less errors so I'm trying that
> route to get the data back.

If using the dd'ed image is giving you less errors, combined with your
other description, it's causing me to be really suspicious about the
hard drive. If you're really brave, or foolish, (or have already
backed up the image), you might try doing a non-destructive read/write
test using the badblocks(8) command. I'm pretty confident that it
will turn up all sorts of problems, though, since the low-level device
driver errors you were describing really are not consistent with
filesystem corruption, but with a hardware failure of some kind.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/