Re: Question about tcp hash function tcp_hashfn()

From: Brian F. G. Bidulock
Date: Thu Jun 01 2006 - 14:39:32 EST


Evgeniy,

On Thu, 01 Jun 2006, Evgeniy Polyakov wrote:

I think the sun shines more in Moscow than in Edmonton, so it is not
so random. ;)

>
> Specially for you :)

Thank you for being so gracious and patient with me.

> It does not have artifacts, but it's dispersion is wider than XOR one.
> _Much_ wider, which tends to creation of some specially crafted source
> distribution which ends up in totally broken fairness.
> As usual folded and not folded versions behave exactly the same.
>
> > > With following ip/port selection algo:
> > > if (++sport == 0) {
> > > //saddr++;
> > > sport += 123;
> > > }
> > >
> > > I see yet another jenkins artefacts, but again different from previous
> > > two.
> >
> > Adding primes. Again, the arithmetic series of primes might auto-correlate
> > with the Jenkins function. Or it plain might not like gaps.
> >
>
> I want to confirm three things and one state:
> 1. Jenkins hash has some unacceptible artefacts in some source
> address/port distributions, no matter if it has some law embedded or it
> is (pseudo)-random set.
>
> If there are bugs, bugs exist.

True, artifacts appeared even in the basic arithmetic sequence of
primes. It is quite possible that a large set of natural sequences
might cause artifacts.

>
> 2. If it does not have artifacts it has unacceptible dispersion.

This is likely due to the relatively small sample sets; however, real
experienced data sets would be very small compared to the widest
possible set and might also contain structured gaps.

>
> 3. It is 3 times slower than XOR one (28 seconds for XOR for 2^29
> iterations vs. 101 seconds jhash nonfolded and 109 jhash folded on my AMD64
> 3500+ 2.2 Ghz desktop).

Yes, it is slower by inspection.

>
> 4. I believe it can be tuned or has some gaps inside refactoring logic,
> which can be fixed, but as is it can not be used for fair hash creation.

Yes, I now agree. And, for the purpose of dynamic hash sizing, high
dispersion is worse than artifacts.

For some realistic TCP data sets it appears that XOR is superior.

Thank you again for your efforts in resolving my doubts.

So what are your thoughts about my sequence number approach (for
connected sockets)?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/