Re: Transparent compression in the FS

From: Davide Libenzi
Date: Thu Oct 16 2003 - 16:38:44 EST


On Thu, 16 Oct 2003, Jeff Garzik wrote:

> Val Henson wrote:
> > Abstract:
> >
> > "Recent research has produced a new and perhaps dangerous technique
> > for uniquely identifying blocks that I will call
> > compare-by-hash. Using this technique, we decide whether two blocks
> > are identical to each other by comparing their hash values, using a
> > collision-resistant hash such as SHA-1. If the hash values match,
> > we assume the blocks are identical without further ado. Users of
> > compare-by-hash argue that this assumption is warranted because the
> > chance of a hash collision between any two randomly generated blocks
> > is estimated to be many orders of magnitude smaller than the chance
> > of many kinds of hardware errors. Further analysis shows that this
> > approach is not as risk-free as it seems at first glance."
>
>
> I'm curious if anyone has done any work on using multiple different
> checksums? For example, the cost of checksumming a single block with
> multiple algorithms (sha1+md5+crc32 for a crazy example), and storing
> each checksum (instead of just one sha1 sum), may be faster than reading
> the block off of disk to compare it with the incoming block. OTOH,
> there is still a mathematical possibility (however-more-remote) of a
> collission...

At that point it is better to extend the fingerprint size, since the SHA1
algorithm has a better distribution compared to md5 and crc32. Probability
estimates are pretty low though. If you consider a 2^32 blocks FS, that
with a 4Kb block size makes a 4 tera FS, the collision probability is in
the orders of 2^(-95) (with a 160 bit fingerprint). That's a pretty low
number. Yes, it is true that the input is not completely random, but a
good property of SHA1 is the one of spreading output result of very
similar input patterns.




- Davide

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/