Re: [ANNOUNCE] Ramback: faster than a speeding bullet

From: Ric Wheeler
Date: Fri Mar 14 2008 - 11:47:53 EST


Theodore Tso wrote:
On Fri, Mar 14, 2008 at 12:41:31PM +0100, Benny Amorsen wrote:
Ric Wheeler <ric@xxxxxxx> writes:

The only really safe default is to disable the write cache by default
or possibly dynamically disable the write cache when barriers are not
supported by a drive. Both have a severe performance impact and I am
not sure that for most casual users it is a good trade.
So people ARE running their disks in a mode similar to Ramback.

Similar, but not as aggressive. Remember, the size of the write cache
on the hard drive is relatively small (small number of megabytes), and
the drive generally is relatively aggressive about getting the data
out to the platters; it's probably not going to keep unwritten data on
the platters for minutes or hours at a time, let alone days. Of
course, unless you use write barriers or some kind of explicit write
ordering, it's going to write stuff out in an order which is
convenient to the hard drive, not necessarily an order convenient to
the filesystem.

You get 8-16MB per disk with most drives today. Different firmware will do different things about how aggressively they push the data out to platter.

Also, if the system crashes, you don't lose the data in hard drive's
write cache, where as the data in Ramback is likely gone. And Ramback
is apaprently keeping potentially several gigabytes dirty in memory
and *not* writing it out very aggressively. So the exposure is one of
degree.

In practice, it's interesting that we've had so few people reporting
massive data loss despite the lack of the use of write barriers.
Sure, in absolutely critical situations, it's not a good thing; but if
I had a mail server, where I really wanted to make sure I didn't lose
any e-mail, having a small UPS which could keep the server going for
just a few minutes so it could do a controlled shutdown on a power
failure is probably a better engineering solution from a
cost/benefit/performance point of view, compared to turning on write
barriers and taking up to two orders of magnitude worth of performance
hit.

- Ted

Most people don't see power outages too often - maybe once a year? When you travel with a laptop, we are always effectively on a UPS so that will also tend to mask this issue.

The ingest rate at the time of a power hit makes a huge difference as well - basically, pulling the power cord when a box is idle is normally not harmful. Try that when you are really pounding on the disks and you will see corruptions a plenty without barriers ;-)

One note - the barrier hit for apps that use fsync() is just half an order of magnitude (say 35 files/sec instead of 120 files/sec). If you don't fsync() each file, the impact is lower still.

Still expensive, but might be reasonable for home users on a box with family photos, etc.

ric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/