Re: [PATCH] Scalable page cache

From: Linus Torvalds (torvalds@transmeta.com)
Date: Mon Nov 26 2001 - 12:29:51 EST


In article <Pine.LNX.4.33.0111261753480.10763-100000@localhost.localdomain>,
Ingo Molnar <mingo@elte.hu> wrote:
>
>it gets rid of the pagecache lock without introducing a tree.
>
>while reducing memory footprint is a goal we want to achieve, the
>pagecache hash is such a critical piece of data structure that we want
>O(1)-type search properties, not a tree. The pagetable hash takes up 0.2%
>of RAM currently. (but we could cut the size of the hash in half i think,
>it's a bit over-sized currently - it has as many entries.)

I actually considered a tree a long time ago, but I was thinking more
along the lines of the page table tree - with the optimization of being
able to perhaps map sub-trees _directly_ into the address space.

it's a cool idea, especially if done right (ie try to share the
functions between the VM trees and the "page cache tree"), but I was too
lazy to try it out (it's a _lot_ of work to do right). And I suspect
that it would optimize all the wrong cases, ie on x86 you could mmap
4MB-aligned areas at 4MB offsets with "zero cost", but in real life
that's not a very common situation.

Tree's _do_ have advantages over hashes, though, in having both better
cache locality and better locking locality.

I don't think a binary tree (even if it is self-balancing) is the proper
format, though. Binary trees have bad cache characteristics, and as
Ingo points out, with large files (and many performance-critical things
like databases have _huge_ files) you get bad behaviour on lookup with a
binary tree.

A indexed tree (like the page tables) has much better characteristics,
and can be looked up in O(1), and might be worth looking into. The
locality of a indexed tree means that it's MUCH easier to efficiently
fill in (or get rid of) large contiguous chunks of page cache than it is
with hashes. This can be especially useful for doing swapping, where you
don't have to look up adjacent pages - they're right there, adjacent to
your entry.

Anybody interested?

                Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Nov 30 2001 - 21:00:22 EST