Re: [patch] hashtable sizes for icache and dcache
Alexander Kjeldaas (astor@fast.no)
Mon, 27 Sep 1999 16:34:18 +0200
On Thu, Sep 23, 1999 at 03:59:36PM +0200, Ingo Molnar wrote:
>
> On Thu, 23 Sep 1999, Andrea Arcangeli wrote:
>
> > >this patch alone indeed is just fixing the symptoms. If such a patch makes
> >
> > You don't understand the difference between fixing the symptom and fixing
> > the cause of a problem.
>
> what i and kernel@kvack.org (Rik? he has sent the mail privately to you i
> think) ment: fixing the hash bit-order alone is just the symptom of the
> real problem. The real problem is that boxes with different memory sizes
> have fundamntally different 'optimal hashsize'.
>
> Historically, when the dcache/inode/buffer cache/page cache/tcp/etc hashes
> were developed only one hashsize was picked, which was (hopefully) the
> optimum for the memory size the original developer used. This worked
> pretty well for the 4M-32M RAM range. In the last 2 years or so RAM prices
> finally started to converge to their production costs, and bigger RAM
> boxes started to appear. The symptom: (which i think Leonard Zubkoff or
> Doug Ledford noticed more than a year ago) is that on big memory boxes
> (512M, 1G, 2G RAM boxes) we waste many cycles on cache misses and list
> walking.
>
> the wrong solution: to change HASH_BITS ad-hoc for one specific (big)
> memory size. This slows down the 4M-32M RAM range (both through wasting
> memory, and through having less cache-locality in that huge hashtable).
> Those small-memory boxes are very important to Linux as well.
>
> the real solution: to put an architecture in place that makes it easy to
> boot-time tune the hash, depending on RAM-size. Some parts of the code
> (buffer-cache) did this already, but we clearly needed to do this at once,
> and in the same way. This is David's (Chuk's) patch. Actually it turns out
> that hashsize-heuristics are very subsystem-dependent, so David's patch
> does hash sizing in every place differently, but still we needed some
> central patch that does it in the same style in every important place, and
> the few remaining places can now tune up to this methodology. Actually
> providing such a complete patch is much harder than it looks like and
> needs broad understanding of all subsystems and needs careful tuning,
> probably this is the reason why it took a year or so for someone to take
> up the issue :)
>
Have you looked into using linear hashing? It will give you tuned
hash-sizes at all times. It is pretty easy to use linear hashing with
a collection of 4k pages too by adding an extra indirection. My guess
is that, in most cases, having optimally tuned hash-tables at all
times beats the slowdown imposed by going through an extra (cache-hit
friendly) pointer indirection.
I imagine such a hash-table as being a collection of 4k pages. The
first page contains a list of pointers to hash-table pages, and
optionally the rest filled with hash-table entries laid out something
like this:
----------
[hash-entry]
...
[pointer to Nth page]
[pointer to N-1th page]
...
[pointer to 1st page]
The other pages will only contain hash-entries. The reason for a
layout like the above is that when you use linear hashing, at some
point you will have to add a new page. With the above scheme, you can
do this by allocating a new page, and copying a small data area at the
end of the previous "master page" to a new "master page" [the
pointers]. This means that adding a new page is a pretty light-weight
operation. Using linear hashing, the rehashing job is also spread
across a lot of accesses, so the worst-case latency of this scheme
shouldn't be too bad.
The above scheme will be limited to handling
1024*4096/sizeof(hash-entry) or 4M entries if the hash-entry is a
32-bit pointer. I guess it could be expanded to use 8k pages if the
allocator in the kernel can provide enough page-twins.
astor
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/