Re: what is our answer to ZFS?

From: Bill Davidsen
Date: Mon Nov 21 2005 - 18:02:44 EST

Rob Landley wrote:
On Monday 21 November 2005 08:19, Tarkan Erimer wrote:

On 11/21/05, Diego Calleja <diegocg@xxxxxxxxx> wrote:
If It happenned, Sun or someone has port it to linux.
We will need some VFS changes to handle 128 bit FS as "Jörn ENGEL"
mentionned previous mail in this thread. Is there any plan or action
to make VFS handle 128 bit File Sytems like ZFS or future 128 bit

I believe that on 64 bit platforms, Linux has a 64 bit clean VFS. Python says 2**64 is 18446744073709551616, and that's roughly:
18,446,744,073,709,551,616 bytes
18,446,744,073,709 megs
18,446,744,073 gigs
18,446,744 terabytes
18,446 ... what are those, petabytes?
18 Really big lumps of data we won't be using for a while yet.

And that's just 64 bits. Keep in mind it took us around fifty years to burn through the _first_ thirty two (which makes sense, since Moore's Law says we need 1 more bit every 18 months). We may go through it faster than we went through the first 32 bits, but it'll last us a couple decades at least.

Now I'm not saying we won't exhaust 64 bits eventually. Back to chemistry, it takes 6.02*10^23 protons to weigh 1 gram, and that's just about 2^79, so it's feasible that someday we might be able to store more than 64 bits of data per gram, let alone in big room-sized clusters. But it's not going to be for years and years, and that's a design problem for Sun.

There's a more limiting problem, energy. Assuming that the energy to set one bit is the energy to reverse the spin of an electron, call that s. If you have each value of 128 bit address a single byte, then
T = s * 8 * 2^128 and T > B
where T is the total enargy to low level format the storage, and B is the energy to boil all the oceans of earth. That was in one of the physics magazines earlier this year. there just isn't enough energy usable to write that much data.

Sun is proposing it can predict what storage layout will be efficient for as yet unheard of quantities of data, with unknown access patterns, at least a couple decades from now. It's also proposing that data compression and checksumming are the filesystem's job. Hands up anybody who spots conflicting trends here already? Who thinks the 128 bit requirement came from marketing rather than the engineers?

Not me, if you are going larger than 64 bits you have no good reason not to double the size, it avoids some problems by fitting in two 64 bit registers nicely without truncation or extension. And we will never need more than 128 bits so the addressing problems are solved.

If you're worried about being able to access your data 2 or 3 decades from now, you should _not_ be worried about choice of filesystem. You should be worried about making it _independent_ of what filesystem it's on. For example, none of the current journaling filesystems in Linux were available 20 years ago, because fsck didn't emerge as a bottleneck until filesystem sizes got really big.

I'm gradually copying backups from the 90's off DC600 tapes to CDs, knowing that they will require at least one more copy in my lifetime (hopefully).
--
-bill davidsen (davidsen@xxxxxxx)
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html