Re: [patch 2/7] lib/hashmod: Add modulo based hash mechanism

From: George Spelvin
Date: Fri Apr 29 2016 - 00:12:44 EST


Linus wrote:
> Having looked around at other hashes, I suspect we should look at the
> ones that do five or six shifts, and a mix of add/sub and xor. And
> because they shift the bits around more freely you don't have the
> final shift (that ends up being dependent on the size of the target
> set).

I'm not sure that final shift is a problem. You need to mask the result
to the desired final size somehow, and a shift is no more cycles than
an AND.

> It really would be lovely to hear that we can just replace
> hash_int/long() with a better hash. And I wouldn't get too hung up on
> the multiplication trick. I suspect it's not worth it.

My main concern is that the scope of the search grows enormously
if we include such things. I don't want to discourage someone
from looking, but I volunteered to find a better multiplication
constant with an efficient add/subtract chain, not start a thesis
project on more general hash functions.

Two places one could look for ideas, though:
http://www.burtleburtle.net/bob/hash/integer.html
https://gist.github.com/badboy/6267743

Here's Thomas Wang's 64-bit hash, which is reputedly quite
good, in case it helps:

uint64_t hash(uint64_t key)
{
key = ~key + (key << 21); // key = (key << 21) - key - 1;
key ^= key >> 24;
key += (key << 3)) + (key << 8); // key *= 265
key ^= key >> 14;
key += (key << 2)) + (key << 4); // key *= 21
key ^= key >> 28;
key += key << 31;
return key;
}

And his slightly shorter 64-to-32-bit function:
unsigned hash(uint64_t key)
{
key = ~key + (key << 18); // key = (key << 18) - key - 1;
key ^= key >> 31;
key *= 21; // key += (key << 2)) + (key << 4);
key ^= key >> 11;
key += key << 6;
key ^= key >> 22;
return (uint32_t)key;
}


Sticking to multiplication, using the heuristics in the
current comments (prime near golden ratio = 9e3779b9 = 2654435769,)
I can come up with this for multiplying by 2654435599 = 0x9e37790f:

// -----------------------------------------------------------------------------
// This code was generated by Spiral Multiplier Block Generator, www.spiral.net
// Copyright (c) 2006, Carnegie Mellon University
// All rights reserved.
// The generated code is distributed under a BSD style license
// (see http://www.opensource.org/licenses/bsd-license.php)
// -----------------------------------------------
// Cost: 6 adds/subtracts 6 shifts 0 negations
// Depth: 5
// Input:
// int t0
// Outputs:
// int t1 = 2654435599 * t0
// -----------------------------------------------
t3 = shl(t0, 11); /* 2048*/
t2 = sub(t3, t0); /* 2047*/
t5 = shl(t2, 8); /* 524032*/
t4 = sub(t5, t2); /* 521985*/
t7 = shl(t0, 25); /* 33554432*/
t6 = add(t4, t7); /* 34076417*/
t9 = shl(t0, 9); /* 512*/
t8 = sub(t9, t0); /* 511*/
t11 = shl(t6, 4); /* 545222672*/
t10 = sub(t11, t6); /* 511146255*/
t12 = shl(t8, 22); /* 2143289344*/
t1 = add(t10, t12); /* 2654435599*/

Which translates into C as

uint32_t multiply(uint32_t x)
{
unsigned y = (x << 11) - x;

y -= y << 8;
y -= x << 25;
x -= x << 9;
y -= y << 4;
y -= x << 22;
return y;
}

Unfortunately, that utility bogs like hell on 64-bit constants.