Re: IDE DMA errors, massive disk corruption: Why? Fixed Yet? Why not re-do failed op?

From: Daniel B.
Date: Mon Oct 06 2003 - 15:21:24 EST


"Mudama, Eric" wrote:
...
> > Doesn't the kernel keep track of uncompleted operations,
> > retain the information needed to try again, and try again
> > if there's a failure? If not, why not?
>
> If the disk has write cache enabled, this isn't necessarilly possible, since
> there's nothing in the IDE specification that guarantees the order of writes
> to the media without a FLUSH CACHE (EXT) command.

Are you sure? If you issue a write to block 1 and then issue another
write to block 1, it would have to guarantee the relative order of those
writes (or equivalent optimization in the write cache), wouldn't it?


> Hypothetically, if you were doing full-pack random writes continuously with
> no idle time and no FLUSH CACHE, you can have writes that are days old still
> in the drive's buffer and still un-attempted. A write with write-cache
> enabled reports ending status at the completion of the transfer. There is
> no mechanism to tell the host that a cached write failed, other than giving
> an error on the next command.

But we're not talking about errors IN the disk drive after the communi-
cation between the kernel and drive is already done. We're talking
about errors in the communication BETWEEN the kernel and the drive (lost
DMA interrupts), aren't we?

If the kernel issues a write command to the drive, and never gets a
response (DMA-complete interrupt?) from the drive that it has accepted
the command, why can't the kernel repeat the write command?

Daniel
--
Daniel Barclay
dsb@xxxxxxxxx
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/