Well, what processor gives that ?
At Alpha ... Depending on GCC optimization level the calculation result
of block hash is from 23 instructions (gcc optimized, 4 memory loads)
via 30 instructions (Compaq C Compiler optimized or not, along with 13
memory loads) to 42 instructions (gcc -O0, 12 memory loads)
Last instruction is load from the hash table.
(surprisingly at this particular case egcs-1.1.2 beats ccc-6.2 ..)
Presuming these three variables are near each other, and likely
within some cache-line (they are used *a lot*), they may be retrievable
from l0 cache (at least from l1 cache), and then code is also in
cache, it goes 2 instructions in parallel -- 12 clocks..
(I still consider 21164 series processors as the standard benchmark,
although 21264 can do 4 integer ops in parallel..)
-------------- hashtest.c ----------------
static unsigned int bh_hash_mask = 0;
static unsigned int bh_hash_shift = 0;
static void ** hash_table = 0;
/* After several hours of tedious analysis, the following hash
* function won. Do not mess with it... -DaveM
*/
#define _hashfn(dev,block) \
((((dev)<<(bh_hash_shift - 6)) ^ ((dev)<<(bh_hash_shift - 9))) ^ \
(((block)<<(bh_hash_shift - 6)) ^ ((block) >> 13) ^ ((block) << \
(bh_hash_shift - 12))))
#define hash(dev,block) hash_table[_hashfn(dev,block) & bh_hash_mask]
void * hashblk(long dev, long block)
{
return hash(dev,block);
}
> Later,
> David S. Miller
> davem@redhat.com
/Matti Aarnio
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/