Re: [patch] CFS scheduler, -v6

From: William Lee Irwin III
Date: Sun Apr 29 2007 - 05:00:45 EST


On Sun, Apr 29, 2007 at 12:54:36AM -0700, William Lee Irwin III wrote:
>> Common code for rbtree-based priority queues can be factored out of
>> cfq, cfs, and hrtimers.

On Sun, Apr 29, 2007 at 10:13:17AM +0200, Willy Tarreau wrote:
> In my experience, rbtrees are painfully slow. Yesterday, I spent the
> day replacing them in haproxy with other trees I developped a few
> years ago, which look like radix trees. They are about 2-3 times as
> fast to insert 64-bit data, and you walk through them in O(1). I have
> many changes to apply to them before they could be used in kernel, but
> at least I think we already have code available for other types of trees.

Dynamic allocation of auxiliary indexing structures is problematic for
the scheduler, which significantly constrains the algorithms one may
use for this purpose.

rbtrees are not my favorite either. Faster alternatives to rbtrees
exist even among binary trees; for instance, it's not so difficult to
implement a heap-ordered tree maintaining the red-black invariant with
looser constraints on the tree structure and hence less rebalancing.
One could always try implementing a van Emde Boas queue, if he felt
particularly brave.

Some explanation of the structure may be found at:
http://courses.csail.mit.edu/6.897/spring03/scribe_notes/L1/lecture1.pdf

According to that, y-trees use less space, and exponential trees are
asymptotically faster with a worst-case asymptotic running time of

O(min(lg(lg(u))*lg(lg(n))/lg(lg(lg(u))), sqrt(lg(n)/lg(lg(n)))))

for all operations, so van Emde Boas is not the ultimate algorithm by
any means at O(lg(lg(u))); in these estimates, u is the size of the
"universe," or otherwise the range of the key data type. Not to say
that any of those are appropriate for the kernel; it's rather likely
we'll have to settle for something less interesting, if we bother
ditching rbtrees at all, on account of the constraints of the kernel
environment.

I'll see what I can do about a userspace test harness for priority
queues more comprehensive than smart-queue.c. I have in mind the
ability to replay traces obtained from queues in the kernel and loading
priority queue implementations via dlopen()/dlsym() et al. valgrind can
do most of the dirty work. Otherwise running a trace for some period of
time and emitting the number of operations it got through should serve
as a benchmark. With that in hand, people can grind out priority queue
implementations and see how they compare on real operation sequences
logged from live kernels.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/