Re: raid is dangerous but that's secret (was Re: [patch] ext2/3:document conditions when reliable operation is possible)

From: Christoph Hellwig
Date: Sun Aug 30 2009 - 12:36:10 EST


On Sun, Aug 30, 2009 at 06:44:04PM +0400, Michael Tokarev wrote:
>> If you lose power with the write caches enabled on that same 5 drive
>> RAID set, you could lose as much as 5 * 32MB of freshly written data on
>> a power loss (16-32MB write caches are common on s-ata disks these
>> days).
>
> This is fundamentally wrong. Many filesystems today use either barriers
> or flushes (if barriers are not supported), and the times when disk drives
> were lying to the OS that the cache got flushed are long gone.

While most common filesystem do have barrier support it is:

- not actually enabled for the two most common filesystems
- the support for write barriers an cache flushing tends to be buggy
all over our software stack,

>> For MD5 (and MD6), you really must run with the write cache disabled
>> until we get barriers to work for those configurations.
>
> I highly doubt barriers will ever be supported on anything but simple
> raid1, because it's impossible to guarantee ordering across multiple
> drives. Well, it *is* possible to have write barriers with journalled
> (and/or with battery-backed-cache) raid[456].
>
> Note that even if raid[456] does not support barriers, write cache
> flushes still works.

All currently working barrier implementations on Linux are built upon
queue drains and cache flushes, plus sometimes setting the FUA bit.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/