Re: more on hash functions

Chuck Lever (cel@monkey.org)
Thu, 15 Apr 1999 11:57:34 -0400 (EDT)


On Wed, 14 Apr 1999, Paul F. Dietz wrote:
> > #define FIRSTBYTE(x) (0xff & (x))
> > #define SECNDBYTE(x) (0xff & ((x)>>8))
> > #define THIRDBYTE(x) (0xff & ((x)>>16))
> >
> > #define hashfn(key) ( (rand1[FIRSTBYTE(key)] << 4) ^ \
> > (rand2[SECNDBYTE(key)] << 2) ^ \
> > (rand1[THIRDBYTE(key)]) )
> >
> > to generate a 12-bit hash table index. this way i don't have to mask off
> > extra bits as the final step, and i can use daintily sized random tables.
>
> This is also clearly bad. Suppose you're hashing keys that
> have a common third byte? The low two bits of your hash
> values will be constant.
>
> You really have to make every bit of the result depend on
> every bit in the key.

in theory that's true. however, i also tried a pair of 10 (or 12) bit
random tables x-ored together without shifting, and the result wasn't much
better. i think you can get away with this if your input data allows it.

but again, using small random tables is to be desired. can you think of a
way to make a random table-driven hash function work with an absolute
minimum amount of memory references?

- Chuck Lever

--
corporate:	<chuckl@netscape.com>
personal:	<chucklever@netscape.net> or <cel@monkey.org>

The Linux Scalability project: http://www.citi.umich.edu/projects/citi-netscape/

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/