Re: [patch 2/7] lib/hashmod: Add modulo based hash mechanism

From: Sandy Harris
Date: Sun Jun 12 2016 - 08:18:41 EST


On Thu, Apr 28, 2016 at 10:25 PM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> It's the hashes that _look_ like they might be good hashes, but
> there's not a lot of analysis behind it, that I would worry about. The
> simple prime modulus _should_ be fine, but at the same time I kind of
> suspect we can do better. Especially since it has two multiplications.
>
> Looking around, there's
>
> http://burtleburtle.net/bob/hash/integer.html
>
> and that 32-bit "full avalanche" hash in six shifts looks like it
> could be better. You wouldn't want to inline it, but the point of a
> full avalanche bit mixing _should_ be that you could avoid the whole
> "upper bits" part, and it should work independently of the target set
> size.
>
> So if that hash works better, it would be a pretty good replacement
> option for hash_int().
>
> There is also
>
> https://gist.github.com/badboy/6267743
>
> that has a 64 bit to 32 bit hash function that might be useful for
> "hash_long()".
>
> Most of the people who worry about hashes tend to have strings to
> hash, not just a single word like a pointer, but there's clearly
> people around who have tried to search for good hashes that really
> spread out the bits.
>
> Linus

Here's another possibility, from my GPL code at:
https://github.com/sandy-harris/maxwell

Not very efficient -- two each of 32-bit multiply & modulo
in most cases -- but provably good mixing.

/*
Quasi-Hadamard transform
My own invention

Goal is to mix a 32-bit object
so that each output bit depends
on every input bit

Underlying primitive is IDEA
multiplication which mixes
a pair of 16-bit objects

This is analogous to the
pseudo-Hadamard transform
(PHT) originally from the
SAFER cipher, later in
Twofish and others

Conceptually, a two-way PHT
on a,b is:

x = a + b
y = a + 2b
a = x
b = y

This is reversible; it loses
no information. Each output
word depends on both inputs.

A PHT can be implemented as

a += b
b += a

which is faster and avoids
using intermediate variables

QHT is the same thing using
IDEA multiplication instead
of addition, calculating a*b
and a*b^2 instead of a+b and
a+2b

IDEA multiplication operates
on 16-bit chunks and makes
every output bit depend on
all input bits. Therefore
QHT is close to an ideal
mixer for 32-bit words.
*/

u32 qht(u32 x)
{
u32 a, b ;
a = x >> 16 ; // high 16 bits
b = x & 0xffff ; // low 16
a = idea(a,b) ; // a *= b
b = idea(a,b) ; // b *= a
return( (a<<16) | b) ;
}

/*
IDEA multiplication
borrowed from the IDEA cipher
*/
#define MAX (1<<16)
#define MOD (MAX+1)

u32 idea(u32 a, u32 b)
{
u32 x ;
// make sure they are in range
a %= MOD ;
b %= MOD ;
// special cases
if( (a == 0) && (b == 0))
return(1) ;
else if( a == 0)
return(MOD - b) ;
else if( b == 0)
return(MOD - a) ;
// non-special
else {
x = (a*b) % MOD ;
if(x == MAX)
return(0) ;
else return(x) ;
}
}