JR> I've tried this idea. I did an MD5 of every block (4KB) in a partition
JR> and counted the number of blocks with the same hash. Only about 5-10% of
JR> blocks on several filesystem were actually duplicates. This might be
JR> better if you reduced the block size to 512 bytes, but there's a
JR> question of how much extra space filesystem structures would then take
JR> up.
JR> Basically, it didn't look like compressing duplicate blocks would
JR> actually be worth the extra structures or CPU.
JR> On the other hand, a COW fs would be excellent for making file copying
JR> much quicker. You can do things like copying the linux kernel tree using
JR> 'cp -lR', but the files do not act as if they are unique copies - and
JR> I've been bitten many times when I forgot this. If you had COW, you
JR> could just copy the entire tree and forget about the fact they're
JR> linked.
Yeah, I'm mostly thinking about this kind of COW fs usage. You may copy
gigabytes in the instant and don't bother about tracking duplicate
files ("zero blocks left??? where's the hell I copied that .mpg's???").
Now, sometimes we use hardlinks as "poor man's COW fs", but
I bet it's error prone. Every now and then you forget it's a
hardlinked kernel tree and start happily hacking in it... :-(
A "compressor" which hunts and merges duplicate blocks is a bonus,
not a primary tool.
-- Best regards, VDA mailto:VDA@port.imtp.ilyichevsk.odessa.ua http://port.imtp.ilyichevsk.odessa.ua/vda/- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Sat Sep 15 2001 - 21:00:24 EST