On Mon, 30 Mar 2009, Ric Wheeler wrote:I still disagree strongly with the don't force flush idea - we have an
absolute and critical need to have ordered writes that will survive a power
failure for any file system that is built on transactions (or data base).
Read that sentence of yours again.
In particular, read the "we" part, and ponder.
YOU have that absolute and critical need.
Others? Likely not so much. The reason people run "data=ordered" on their laptops is not just because it's the default - rather, it's the default _because_ it's the one that avoids most obvious problems. And for 99% of all people, that's what they want.
And as mentioned, if you have to have absolute requirements, you absolutely MUST be using real RAID with real protection (not just RAID0).
Not "should". MUST. If you don't do redundancy, your disk _will_ eventually eat your data. Not because the OS wrote in the wrong order, or the disk cached writes, but simply because bad things do happen.
But turn that around, and say: if you don't have redundant disks, then pretty much by definition those drive flushes won't be guaranteeing your data _anyway_, so why pay the price?
The big issues are that for s-ata drives, our flush mechanism is really,
really primitive and brutal. We could/should try to validate a better and less
onerous mechanism (with ordering tags? experimental flush ranges? etc).
That's one of the issues. The cost of those flushes can be really quite high, and as mentioned, in the absense of redundancy you don't actually get the guarantees that you seem to think that you get.
I spent a very long time looking at huge numbers of installed systems
(millions of file systems deployed in the field), including taking part in
weekly analysis of why things failed, whether the rates of failure went up or
down with a given configuration, etc. so I can fully appreciate all of the
ways drives (or SSD's!) can magically eat your data.
Well, I can go mainly by my own anecdotal evidence, and so far I've actually had more catastrophic data failure from failed drives than anything else. OS crashes in the middle of a "yum update"? Yup, been there, done that, it was really painful. But it was painful in a "damn, I need to force a re-install of a couple of rpms".
Actual failed drives that got read errors? I seem to average almost one a year. It's been overheating laptops, and it's been power outages that apparently happened at really bad times. I have a UPS now.
What you have to keep in mind is the order of magnitude of various buckets of
failures - software crashes/code bugs tend to dominate, followed by drive
failures, followed by power supplies, etc.
Sure. And those "write flushes" really only cover a rather small percentage. For many setups, the other corruption issues (drive failure) are not just more common, but generally more disastrous anyway. So why would a person like that worry about the (rare) power failure?
I have personally seen a huge reduction in the "software" rate of failures
when you get the write barriers (forced write cache flushing) working properly
with a very large installed base, tested over many years :-)
The software rate of failures should only care about the software write barriers (ie the ones that order the OS elevator - NOT the ones that actually tell the disk to flush itself).
Linus