Re: writing file to disk: not as easy as it looks

From: Mikulas Patocka
Date: Wed Dec 03 2008 - 12:37:57 EST


On Wed, 3 Dec 2008, Alan Cox wrote:

> > implemented in disk firmware. Write errors are reported for disk
> > connection problems, not media problems.
>
> Media errors are reported for writes when the drive knows there are
> problems. That may be deferred to the cache flush afterwards but the
> information is still generated and shipped back to us - eventually.

It a question, how to process cache flush errors correctly. A cache flush
error reported for one filesystem may belong to the data written by other
filesystem. So should some flag "there was an error" be set for all
partitions and report it to every filesystem when it does cache flush? Or
record the time of the last error in the driver and let the filesystem
query it (so that the filesystem can tell if the error happened before or
after it was mounted).

BTW. how does SCSI report cache flush errors? Does it report them on
SYNCHRONIZE CACHE command or does it report them on defered senses?

Another point is that unless the sector remap table is full, there should
be no cache flush errors.

> > For connection problems, another solution may be to retry writes
> > indefinitely until the admin aborts it or reconnects the disk. But I don't
> > know how common these recoverable disk connection errors are.
>
> CRC errors, lost IRQs and the like are retried by the midlayer and
> drivers and the error handling strategies will also try things like
> reducing link speeds on repeated CRC errors.

I meant for example loose cable or so --- does it make sense to retry
indefinitely (until the admin plugs the cable or unmounts the filesystem)
or return error to the filesystem after few retries?

Mikulas

> Alan
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/