Re: Generic B-tree implementation

From: Vishal Patil
Date: Wed Jul 19 2006 - 12:25:05 EST


Andrea

Thank you for your time and a very valuable input. I was thinking of
implementing the VM management using B-trees because I wanted to play
with something interesting in the kernel. However I will definately
look into your idea of page cache as well.

Will keep everyone informed about my progress.

- Vishal


On 7/19/06, Andrea Arcangeli <andrea@xxxxxxx> wrote:
On Wed, Jul 19, 2006 at 09:34:43AM -0400, Vishal Patil wrote:
> I can get rid of recursions using loops, will need to work a little more on
> it.

Before doing the above you may want to learn about all possible malloc
retvals too and to make sure the interface has all needed oom failure
paths that you're obviously missing.

One of the advantages of rbtree vs b-trees (and vs radixtrees too) is
the fact they require zero dynamic metadata allocations of ram. They
use the same trick of list.h to avoid it while still being mostly
generic and sharable library code. Imagine rbtrees like scalable
lists. The kernel usage is quite optimized too, the mmap path for
example does a single lookup and it stores the last "lookup" point
before restarting with an insertion while keeping the mmap_sem (or
mutex renaming of the day) on hold so to avoid the insertion operation
to start over with a second (wasteful) lookup (again very similar to
what you could do if you had list, and the rebalancing is a very
immediate operation too involving only a limited number of pointers).

> Also I will be working on developing a patch for VM management using
> B-trees instead of RB-trees.

Once you start changing those bits, you'll notice the further
requirement of the btrees due to the oom failures in code paths that
are already reasonably complex with vma oom failures.

As speed of cache raises faster than speed of ram, memory seeks tends
to cost more than they did in the past, but I doubt it worth it, most
important especially in the common case of very few vmas. I like the
common case of only a few dozen vmas to be so fast and low
overhead. The corner cases like uml and oracle already use nonlinear,
to also avoid the ram overhead of the vmas, with btree the lowmem
overhead would be even higher (the only 4/8 bytes of overhead of the
rbtrees would even be fixable with David's patch, but nobody
considered it very important so far to eliminate those 4/8 bytes
32bit/64bit per vma, though we can do that in the future). So even if
btree would be faster for those extreme corner cases, it would still
not be a replacement for the nonlinear (I wish there was a decent
replacement for nonlinear, whose only reason to exist seems to be uml
on 64bit archs).

If I would be in you, as a slightly more likely to succeed experiment,
I would be looking into replacing the pagecache radix-tree with a
btree, as long as you can leave intact the tagging properties we have
in the radix-tree needed for finding only dirty elements in the tree
etc... (we use that to avoid separate dirty lists for the pages). You
should also size the order to automatically match the cache size of
the arch (dunno if it's better at compile or run time). I'm no a
radix-tree guru but the btree may save some ram if you've all
pagecache pages scattered all over the place with random access. It
also won't require all levels to be allocated. However it will require
rebalancing, something the radix tree doesn't require, it seems a bit
of a tradeoff, and I suspect the radix-tree will still win in all
common cases. But at least all oom failure paths should already exists
for you, so that should avoid you having to touch much code externally
to your own btree files.

I wish you to have fun with the btrees, I remember I had fun back then
when I was playing with the rbtrees ;).



--
Motivation will almost always beat mere talent.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/