Re: XFS corruption during power-blackout

From: Ric Wheeler
Date: Fri Jul 01 2005 - 09:02:31 EST


Rogério Brito wrote:

On Jul 01 2005, Jens Axboe wrote:


On Fri, Jul 01 2005, David Masover wrote:


Not always possible. Some disks lie and leave caching on anyway.


And the same (and others) disks will not honor a flush anyways.
Moral of that story - avoid bad hardware.



But how does the end-user know what hardware is "good hardware"? Which
vendors don't lie (or, at least, lie less than others) regarding HDs?


Thanks, Rogério Brito.



The only real way is to test the drive (and retest when you get a new versions of firmware) and the whole fsync -> write barrier code path.

We use a bus analyzer to make sure that when you fsync() a file, you will see a cache flush command coming across the bus. Of course, that is the easy step ;-)

The second step is to test your system across power failures. We have a "wbtest" code that we have used to catch bugs. The basic idea is to write a file to a disk with the cache turned off, write the same file to the disk with the write barrier (and working cache flush command) and then randomly drop power to the box. It is important to really drop power to the whole box since a "reset button" push often does not drop power to the drives and will give you false passes.

Our wbtest used to be good at finding holes in the write barrier code using 2.4 kernels and PATA drives, but we have had no luck yet in catching known bugs with this test on 2.6 with S-ATA drives.

Ideas on how to get a more effective test are welcome - it is a very small window that you need to hit to catch a misbehaving drive (i.e., your write cache flush command has returned, you want to drop power and on reboot, validate that the platter contains that last IO correctly). If you had enough NVRAM in a test system, you might be able to substitute a NVRAM backed file system for the write-cache disabled drive and get closer to catching the window.

The alternative is to either run with the write cache disabled (again, you will need to validate that the drive really disabled the cache) or to buy a mid-range or better storage array that provides a non-volatile (battery backed) write cache.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/