Re: [BUG] Possible silent data corruption in filesystems/page cache

From: Andreas Dilger
Date: Thu Jun 02 2016 - 15:32:33 EST

On Jun 1, 2016, at 3:51 AM, Barczak, Mariusz <mariusz.barczak@xxxxxxxxx> wrote:
> We run data validation test for buffered workload on filesystems:
> ext3, ext4, and XFS.
> In context of flushing page cache block device driver returned IO error.
> After dropping page cache our validation tool reported data corruption.

Hi Mariusz,
it isn't clear what you expect to happen here? If there is an IO error
then the data is not written to disk and cannot be correct when read.

The expected behaviour is the IO error will either be returned immediately
at write() time (this used to be more common with older filesystems), or it
will be returned when calling sync() on the file to flush cached data to disk.

> We provided a simple patch in order to inject IO error in device mapper.
> We run test to verify md5sum of file during IO error.
> Test shows checksum mismatch.
> Attachments:
> 0001-drivers-md-dm-add-error-injection.patch - device mapper patch

There is already the dm-flakey module that allows injecting errors into
the IO path.

Cheers, Andreas

