Re: Finding hardlinks

From: Pavel Machek
Date: Wed Jan 03 2007 - 06:57:32 EST


Hi!

> > > the use of a good hash function. The chance of an accidental
> > > collision is infinitesimally small. For a set of
> > >
> > > 100 files: 0.00000000000003%
> > > 1,000,000 files: 0.000003%
> >
> > I do not think we want to play with probability like this. I mean...
> > imagine 4G files, 1KB each. That's 4TB disk space, not _completely_
> > unreasonable, and collision probability is going to be ~100% due to
> > birthday paradox.
> >
> > You'll still want to back up your 4TB server...
>
> Certainly, but tar isn't going to remember all the inode numbers.
> Even if you solve the storage requirements (not impossible) it would
> have to do (4e9^2)/2=8e18 comparisons, which computers don't have
> enough CPU power just yet.

Storage requirements would be 16GB of RAM... that's small enough. If
you sort, you'll only need 32*2^32 comparisons, and that's doable.

I do not claim it is _likely_. You'd need hardlinks, as you
noticed. But system should work, not "work with high probability", and
I believe we should solve this in long term.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/