Re: questions about new buffer hash

David S. Miller (davem@redhat.com)
Fri, 29 Oct 1999 19:44:26 -0700


Date: Sat, 30 Oct 1999 03:33:10 +0200 (CEST)
From: Andrea Arcangeli <andrea@suse.de>

o you didn't considered the time that you need to compute the
hashfn at all. With an almost empty hashtable the hashfn
computation cost will be the only cost you have to reach the
interesting data.
I agree that this may be a minor issue but it's not obvious we
can avoid to consider it in real life.

Yes I most certainly did, I count processor cycles for a living in
the work I do. Anyone who knows me well will tell you that I'll
completely rearchitect the trap handling on UltraSparc from scratch
just to save 2 processor cycles in the trap entry/exit code for that
platform.

The computational cost in real life means:

1) Some _constant_ number of processor cycles
2) Some _constant_ set of instruction cache lines to hold to hash
computation code

Compare this (in real life) to the cost of:

1) _Non-constant_ extra traversals through the main hash chain loop
2) _Non-constant_ memory references assosciated with touching
significantly more than (nr_buffer_heads / nr_hash_chains) buffer
heads.

And furthermore, I did not use any multiplications or other
multi-cycle operations in the hash function.

Furthermore, if the hash table is almost empty, there isn't much
buffer cache activity is there? So whatever small added cost my new
hash function has is lost on the profiling radar compared to whatever
other things the system must be doing.

With your completly empirical approch you may have optimized the
hashfn for ext2 (and ext2+raid) patterns.

I used UFS, MINIX, and ISO9660 file systems in my tests. I did not
explicitly list the filesystem types I had used in my initial
response, for this I apologize.

If it is impossible to generate a perfect hash for all input values,
then some data set must be chosen to act as one which is
representative of the data upon which the hash function can be
expected to act.

I believe the data sets I chose to use are something a bit more than
"empirical" as you have described it, and are a good representation of
the data sets which will be seen by the general populace of Linux
users.

Later,
David S. Miller
davem@redhat.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/