Re: 2.2.6_andrea2.bz2

Michael Schulz (micha@nats.informatik.uni-hamburg.de)
Wed, 5 May 1999 10:59:02 +0200


Chuck Lever:
> take a look at:
>
> http://www.citi.umich.edu/projects/linux-scalability/reports/hash.html
>
Hi Chuck,

i just got curious and tock a glance at your page. Astonishingly you never
used prime numbers for the size of the hashtable. That's bad because hashing
lives from computing modulus of integers to get the associated bucket-id.

So for obvious reasons a hash-function will always reveil better spreading
over the table, when it is totaly calculated within the prime modulus, which
is the tablesize. If you use nonprimes, objects tend to be stored in the same
buckets, which you basicaly what to prevent.

Another property of calculating within modulus is, that you don't get integer
overflows, even though entry-values are big.

For these reasons i wrote a little piece of code to calculate the next higher
prime of a given number. It helps in situations where you have to rehash in
case of too many collisions and where you have to recompute the size of the
hashtable. If you like i can give you that code. You can easily modify it
to fit your needs. Just about 130 lines of C-code.

Regards,
Micha.

-- 
-- Michael Schulz
-- NatS (Natural Language Systems), Uni Hamburg

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/