Re: [ANNOUNCE] Ramback: faster than a speeding bullet
From: Daniel Phillips
Date: Mon Mar 10 2008 - 23:50:53 EST
Hi Alan,
Nice to see so many redhatters taking an avid interest in storage :-)
On Monday 10 March 2008 02:22, Alan Cox wrote:
> > So now you can ask some hard questions: what if the power goes out
> > completely or the host crashes or something else goes wrong while
> > critical data is still in the ramdisk? Easy: use reliable components.
>
> Nice fiction - stuff crashes eventually - not that this isn't useful. For
> a long time simply loading a 2-3GB Ramdisk off hard disk has been a good
> way to build things like compile engines where loss of state is not bad.
Right, and now with ramback you will be able to preserve that state and
have the performance too. It is a wonderful world.
> > If UPS power runs out while ramback still holds unflushed dirty data
> > then things get ugly. Hopefully a fsck -f will be able to pull
> > something useful out of the mess. (This is where you might want to be
> > running Ext3.) The name of the game is to install sufficient UPS power
> > to get your dirty ramdisk data onto stable storage this time, every
> > time.
>
> Ext3 is only going to help you if the ramdisk writeback respects barriers
> and ordering rules ?
I was alluding to to e2fsck's amazing repair ability, not ext3's journal.
> > * Previously saved data must be reloaded into the ramdisk on startup.
>
> /bin/cp from initrd
But that does not satisfy the requirement you snipped:
* Applications need to be able to read and write ramback data during
initial loading.
> > * Cannot transfer directly between ramdisk and backing store, so must
> > first transfer into memory then relaunch to destination.
>
> Why not - providing you clear the dirty bit before the write and you
> check it again after ? And on the disk size as you are going to have to
More accurately: in general, cannot transfer directly. The ramdisk may
be external and not present a memory interface. Even an external
ramdisk with a memory interface (the Violin box has this) would require
extra programming to maintain cache consistency. Then there is the
issue of ramdisks on the way that exceed the 40 bit physical addressing
of current generation processors.
Even for the simple case where the ramdisk is just part of the kernel
unified cache, I would rather not go delving into that code when these
transfers are on the slow path anyway. Application IO does its normal
single copy_to/from_user thing. If somebody wants to fiddle with vm,
the place to attack is right there. The copy_to/from_user can be
eliminated (provided alignment requirements are met) using stupid page
table tricks. In spite of Linus claiming there is no performance win
to be had, I would like to see that put to the test.
> suck all the content back in presumably a log structure is not a big
> concern ?
Sorry, I failed to parse that.
> > * Per chunk locking is not feasible for a terabyte scale ramdisk.
>
> And we care 8) ?
"640K should be enough for anyone"
http://www.violin-memory.com/products/violin1010.html <- 504 GB ramdisk
> > * Handle chunk size other than PAGE_SIZE.
>
> If you are prepared to go bigger than the fs chunk size so lose the
> ordering guarantees your chunk size really ought to be *big* IMHO
The finer the granularity the faster the ramdisk syncs to backing
store. The only attraction of coarse granularity I know of is
shrinking the bitmap, which is currently not so big that it presents
a problem.
Your comment re fs chunk size reveals that I have failed to
communicate the most basic principle of the ramback design: the
backing store is not expected to represent a consistent filesystem
state during normal operation. Only the ramdisk needs to maintain a
consistent state, which I have taken care to ensure. You just need
to believe in your battery, Linux and the hardware it runs on. Which
of these do you mistrust?
Regards,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/