Re: more on hash functions

Chuck Lever (cel@monkey.org)
Wed, 14 Apr 1999 17:02:50 -0400 (EDT)


On Wed, 14 Apr 1999, Iain McClatchie wrote:
> Point 1
> -------
> Chuck, have you tried it this way?
>
> Your random table should be 256 entries of 32 bits each. You could
> have 16-bit entries if you know that the hash table will never have
> more than 64K entries (which I suppose is likely...)
>
> So then:
>
> short random_table[256];
> hash_rec_t *chain_head;
>
> tablebits = tablesize - 1;
> hash_fn = random_table[ key & 0xff ] ^
> random_table[ (key >> 8) & 0xff ] ^
> random_table[ (key >> 16) & 0xff ];
> chain_head = hash_table[ hash_fn & tablebits ];

this is close to the way that i originally implemented it, although i
tried with two tables (first and third byte indexed one table, middle byte
indexed the second table), and with three tables.

my random tables were constructed such that, for a 256 entry random table,
the table contained unique random values from 0 to 255. making these 16
bit numbers would only add some hash tablesize flexibility, but i don't
believe it would help the hash function generate better randomness.

> Point 2
> -------
> Paul, you seem to suggest using seperate random tables for each
> of the key load lookups. I can't see why that would be necessary.
> Why is it that one table will not do? Are byte permutations of
> the keys really frequent?

i didn't find it to be real critical for the real data i was hashing.

> Point 3
> -------
> This random_table actually seems pretty reasonable to me: the
> random_table is 512 bytes, which is a small fraction of P6's 16KB L1
> data cache. The four loads take four cycles' load slots, but P6
> probably
> uses a radix-4 or radix-8 integer multiplier, which would take 9 or 5
> cycles latency anyway. (The FPU has the fully pipelined multiplier on
> the P6. I think Alphas actually have fully pipelined integer
> multipliers.)

the multiplicative hash function i created uses 3 instructions in less
than 16 bytes, consumes a fixed number of CPU cycles, and uses no memory
bandwidth (except instruction fetches) to compute the hash value. my goal
was to reduce memory references, cache pollution and evictions, i.e.
making the hash function more memory-bandwidth friendly (since this is of
prime concern on big iron), and making a reasonable bucket size
distribution to lower the average lookup time.

on a CPU of older architecture, i don't think the buffer hash will be a
significant performance bottleneck, so adding a single integer
multiplication in that path will probably not be an issue. if it is, use
the old hash function.

- Chuck Lever

--
corporate:	<chuckl@netscape.com>
personal:	<chucklever@netscape.net> or <cel@monkey.org>

The Linux Scalability project: http://www.citi.umich.edu/projects/citi-netscape/

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/