Re: more git updates..

From: Kai Henningsen
Date: Sun Apr 24 2005 - 03:40:41 EST


pj@xxxxxxx (Paul Jackson) wrote on 23.04.05 in <20050423174227.51360d63.pj@xxxxxxx>:

> > It's an unavoidable
> > result of using less bits than the original data has.
>
> Even _not_ using a hash will have collisions - copy different globs of
> data around enough, and sooner or later, two globs that started out
> different will end up the same, due to errors in our computers. Even
> ECC on all the buses, channels, and memory will just reduce this chance.

Umm, the whole point of using a digest for the name is to catch these
things as they happen. So if you'd use the whole original bit sequence as
a name, you'd need to have exactly the same bit errors in the data, in the
name, and in the reference to the object, to miss nopticing the problem
early. And it *still* isn't a collision - the data behind name X is
exactly X, always, or it's easily recognizable as broken.

Whereas a hash collision means that both X and Y should be behind name Z.
Both are *correct* behind name Z.

Entirely different situations.

> There is no mathematical perfection obtainable here. Deal with it.

Actually, there is, and your non-hashed name system achieves it.

> If something is likely to happen less than once in a billion years,
> then for all practical purposes, it won't happen.

If that was a truely random thing, then you might have been right. But it
isn't. All possible blobs to a given digest are NOT equally probably (or
of a probability only depending on their size).

We really, really don't know how likely a collision is for the data we
want to store there - just for truely random data.

MfG Kai
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/