Re: [PATCH 2/2] Rewrite jump_label.c to use binary search

From: H. Peter Anvin
Date: Wed Sep 22 2010 - 16:55:24 EST


On 09/22/2010 01:41 PM, Mathieu Desnoyers wrote:
>>
>> In the case of multiple instances of the same key you want the perfect
>> hash to point to the cluster of solutions -- a list. Since this is by
>> simply be an array.
>
> Yep, and sorting the section seems like a very natural way to create
> these arrays. So to summarize:
>
> - We add a post-linking step to core image and module build in
> modpost.c.
> - This step accesses exception tables, tracepoint and static jump
> sections.
> - Both tracepoint and static jump need to be sorted.
> - For each of the 3 sections, a perfect hash is computed (creation
> must have the property to always succeed). The perfect hash creation
> should only take into account the first entry of duplicate keys.
> - Each of these perfect hash would translate into C code that would
> need to be compiled in a post-link phase.
> - Then we can link the perfect hash objects with the rest of the code,
> filling in one symbol per considered section (function pointer to
> the perfect hash function) and setting function pointers in struct
> module for modules.
>
> I'm mostly writing this down as food for thoughts, since my own
> implementation time is currently focused on other things.
>

For what it's worth, here is a working (verified and in use) perfect
hash generator written in Perl:

http://repo.or.cz/w/nasm.git/tree/HEAD:/perllib

Like most other perfect hash generators it needs a prehash: the prehash
should be parameterizable (seedable) and produce 2 ceil(log n) bits of
hash material and cannot have collisions. The actual phash algorithm
compresses it down to a perfect hash. The prehash is typically
generated via a pseudorandom algorithm: the particular implementation
pointed to uses one based on CRC64 because it's fast to compute but has
a finite probability of not existing; a universal prehash is guaranteed
to exist but is much more expensive. In practice a very simple prehash
is usually sufficient, and one goes for speed.

For binary numbers being input, an even simpler prehash based on
multiplies or rotates is generally more than sufficient.

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/