Re: 2.2.6_andrea2.bz2

Chuck Lever (cel@monkey.org)
Wed, 5 May 1999 10:59:20 -0400 (EDT)


On Wed, 5 May 1999, Michael Schulz wrote:
> > http://www.citi.umich.edu/projects/linux-scalability/reports/hash.html
>
> i just got curious and tock a glance at your page. Astonishingly you never
> used prime numbers for the size of the hashtable. That's bad because hashing
> lives from computing modulus of integers to get the associated bucket-id.
>
> So for obvious reasons a hash-function will always reveil better spreading
> over the table, when it is totaly calculated within the prime modulus, which
> is the tablesize. If you use nonprimes, objects tend to be stored in the same
> buckets, which you basicaly what to prevent.
>
> Another property of calculating within modulus is, that you don't get integer
> overflows, even though entry-values are big.

michael-

thanks for your note. if you read Knuth, you'll see that prime-sized hash
tables are not necessary if you use multiplicative hashing. and, you *do*
want multiplicative hashing with overflow because modulus hashing requires
a division operation in the hash function, which is more expensive than a
multiplication operation.

in fact, Knuth proves that multiplicative hashing is at least as good as,
and sometimes better than, modulus hashing on a prime-sized table.

read my report again, and you'll see a histogram that shows an almost
perfect bucket size histogram and an 87+% bucket utilization, all with a
simple multiplicative hash function. that's as good as it gets.

- Chuck Lever

--
corporate:	<chuckl@netscape.com>
personal:	<chucklever@netscape.net> or <cel@monkey.org>

The Linux Scalability project: http://www.citi.umich.edu/projects/linux-scalability/

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/