Re: [OT] Redundancy eliminating file systems, breaking MD5, donating money to OSDL

From: Matthias Schniedermeyer
Date: Tue Jan 20 2004 - 14:25:24 EST


On Sat, Jan 17, 2004 at 02:15:31PM +0100, Bart Samwel wrote:
> On Friday 16 January 2004 21:59, Timothy Miller wrote:
> > Valdis.Kletnieks@xxxxxx wrote:
> > > On Fri, 16 Jan 2004 15:22:39 EST, Timothy Miller <miller@xxxxxxxxxxxxxx>
> said:
> > >>Think about it! If we had a filesystem that actually DID this, and it
> > >>was in the Linux kernel, it would spread far and wide. It's bound to
> > >>happen that someone will identify a collision. We then report that to
> > >>the committee offering the reward and then donate it to OSDL to help
> > >>Linux development.
> > >
> > > Actually, it's *not* "bound to happen". Figure out the number of blocks
> > > you'd need to have even a 1% chance of a birthday collision in a 2**128
> > > space.
> > >
> > > And you'd need that many disk blocks on *a single system*.
> > >
> > > Then figure out the chances of a collision on a small machine that only
> > > has 20 or 30 terabytes (yes, in this case terabytes is small).
> >
> > Certainly. No one machine is going to find it in a reasonable period.
> > OTOH, if a million machines were doing it, it increases the chances by
> > just that much.
>
> Let's take a look at the chances. 30 terabytes is, in a best-case scenario
> (with 512-byte blocks) about 6e10 blocks. That would be roughly
> 6e10*6e10*(2**(-128)), or about 1e-17. With a hundred million machines, the
> chances of a collision would be about 1e-9, disregarding the fact that all
> these machines have a large chance of containing similar blocks -- their data
> isn't truly random, so some blocks have a larger chance of occurring than
> others. The data sets on the machines are probably reasonably static, so if
> the collision isn't found *at once* the chances of it occurring later are
> much smaller. So, even under the most positive assumptions, with a hundred
> million machines with 30 terabytes of storage each, it's extremely probable
> that you won't find a collision. (A 96-bit hash could have been broken with
> this setup however. :) )

There is one fundemental braino in the discussion.

Only HALF the bits are for preventing "accidental" collisions. (The
"birthday" thing). The rest is for preventing to "brute force" an input
that produces the same MD5.(*)

So MD5 has only 2**64 Bits against accidental collsions
Btw. I already had (a/the) MD5 collision(*2) in my life.

So you'd need SHA256 or SHA512 to be "really sure(tm)".



*: AFAIR i read this in the specs of SHA1 (160 bits). So i guess this is
also true for MD5.

*2: I had a direcory of about 1,5 Million images and "md5sum"med them to
eliminate doubles. The Log-file, at one point, had the same md5sum as
one of the pictures.

Bis denn

--
Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated,
cryptic, powerful, unforgiving, dangerous.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/