Re: 2.2.6_andrea2.bz2

Chuck Lever (cel@monkey.org)
Sun, 2 May 1999 23:41:18 -0400 (EDT)


On Sun, 2 May 1999, Hans-Joachim Baader wrote:
> Did anybody think of using a secondary hash function for collisions?
> I guess it's too expensive because it would require to calculate many
> hash functions in the worst case.

is it really clear that secondary hashing provides a better bucket size
distribution? that's the only advantage i can see. otherwise, a
secondary hash is unquestionably more expensive than a one-way hash.
searching through a single long chain is faster than searching through a
chain half as long, then recalculating the hash function and searching
again. and if you're storing an equivalent number of objects in a table
that is the same size as your original one-way hash, your chains will be
equally as long -- O(n/m) where n is the number of hashed items, and m is
the number of buckets.

> Another idea: Currently every bucket in the hash table has to be a
> linked list. I haven't tried it, but the linked list could be replaced
> by a tree in order to limit the worst case. Even better, a bucket could
> stay a linked list as long as only one element is in the bucket (the
> most common case). I a second element is added, it becomes the root
> of a tree. Additional elements build the tree. If an element is removed
> from the tree, the normal tree algorithms apply. If the first element
> is removed, and there are other elements in the tree, one tree element
> must be moved to become the new first element.

this seems expensive and complicated for no good reason. one can
construct a simple-to-understand hash function that will have excellent
average behavior and reasonable worst-case behavior, and even be pretty
cheap to calculate. why would i want to go to the expense of combining
these data structures if it's easier to make simple hashing work well?
this makes insertion and deletion significantly more expensive and prone
to software errors.

> This algorithm would combine the hash table, list, and tree data
> structures, so it would require some extra code to handle special
> cases. But I think it would be very fast because it combines the
> advantages of hash table, list, and tree, and has nearly no overhead
> for the most common cases (buckets with zero or one elements).

take a look at:

http://www.citi.umich.edu/projects/linux-scalability/reports/hash.html

there are a couple of patches and recommendations based on some benchmark
results.

- Chuck Lever

--
corporate:	<chuckl@netscape.com>
personal:	<chucklever@netscape.net> or <cel@monkey.org>

The Linux Scalability project: http://www.citi.umich.edu/projects/linux-scalability/

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/