Re: more on hash functions

Iain McClatchie (iainmcc@ix.netcom.com)
Thu, 15 Apr 1999 15:01:54 -0700


I got a few suggestions about how to use multiple lookups with a
single table. All the suggestions make the hash function itself
slower, and attempt to fix an issue -- hash distribution -- that
doesn't appear to be a problem. I thought I should explain why
the table lookup function is slow.

A multiplication has a scheduling latency of either 5 or 9 cycles on a
P6. Four memory accesses take four cycles on that same P6. So the core
operations for the two hash function are actually very similar in delay,
and the table lookup appears to have a slight edge. The difference is
in the overhead.

A multiplicative hash, at minimum, requires the loading of a constant,
a multiplication, and a shift. Egcs actually transforms some constant
multiplications into a sequence of shifts and adds which may have
shorter latency, but essentially, the shift (and nothing else) goes in
series with the multiplication and as a result the hash function has
very little latency overhead.

A table lookup hash spends quite a lot of time unpacking the bytes
from the key, and furthurmore uses a load slot to unpack each byte.
This makes for 8 load slots, which take 1 cycle each. Even if
fully parallelized with unpacking, we end up with a fair bit of
latency. Worse yet, egcs runs out of registers and ends up shifting
the key value in place on the stack twice, which gobbles two load and
two store slots.

Bottom line: CPUs really suck at bit-shuffling and even byte-shuffling.
If there is some clever way to code the byte unpacking in the table
lookup hash function, perhaps using the x86's trick register file,
it might end up faster than the multiplicative hash.

-Iain

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/