Re: Transparent compression in the FS

From: John Bradford
Date: Thu Oct 16 2003 - 13:28:35 EST


Quote from Larry McVoy <lm@xxxxxxxxxxxx>:
> On Wed, Oct 15, 2003 at 11:13:27AM -0400, Jeff Garzik wrote:
> > Josh and others should take a look at Plan9's venti file storage method
> > -- archival storage is a series of unordered blocks, all of which are
> > indexed by the sha1 hash of their contents. This magically coalesces
> > all duplicate blocks by its very nature, including the loooooong runs of
> > zeroes that you'll find in many filesystems. I bet savings on "all
> > bytes in this block are zero" are worth a bunch right there.
>
> The only problem with this is that you can get false positives. Val Hensen
> recently wrote a paper about this. It's really unlikely that you get false
> positives but it can happen and it has happened in the field.

Surely it's just common sense to say that you have to verify the whole
block - any algorithm that can compress N values into <N values is
lossy by definition. A mathematical proof for that is easy.

John.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/