Re: [Announce] [patch] Modular Scheduler Core and Completely Fair

From: Ph. Marek
Date: Thu Apr 19 2007 - 02:35:10 EST


Pine.LNX.4.64.0704181515290.25880 () alien ! or ! mcafeemobile ! com

Davide Libenzi wrote:
> On Wed, 18 Apr 2007, Ingo Molnar wrote:
> > That's one reason why i dont think it's necessarily a good idea to
> > group-schedule threads, we dont really want to do a per thread group
> > percpu_alloc().
>
> I still do not have clear how much overhead this will bring into the
> table, but I think (like Linus was pointing out) the hierarchy should look
> like:
...
> The "run_queue" concept (and data) that now is bound to a CPU, need to be
> replicated in:
>
> ROOT <- VCPUs add themselves here
> VCPU <- USERs add themselves here
> USER <- PROCs add themselves here
> PROC <- THREADs add themselves here
> THREAD (ultimate fine grained scheduling unit)
>
> So ROOT, VCPU, USER and PROC will have their own "run_queue".
...

I can't comment on the internals about run_queues, overhead and so on, but
these discussion leads me to the idea about a dynamic *tree* of scheduler
queues.

With dynamic I mean that they are configured in user-space - be it with
something like CLONE_NEW_SCHEDULER_CLASS, or possibly better some other
interface to allow an *arbitrary* tree that is not coupled on the
user/process/thread borders. New threads and processes are per default
created in the parents queue, just like now.

So user-space could build an tree like this (eg with a pam module):

Default queue - init
+- kernel-thread queue (to avoid having kernel threads being blocked by
| user-space)
+- cron, atd, sshd, .... unless they change their "class"
+- user1
| +- X
| +- kde
| | + konsole
| | \ kmail
| | + mail fetch thread
| | + mail filter thread
| | + GUI thread
| | \- mplayer
\- user2
+.....

Whether the queues are handled with some staircase behaviour, or CFS, or just
get CPU time distributed by nice level, is another question - but they have
to be "fair" only locally.

Of course, that's simply some sort of moving the problem into user-space - but
I think (and read that often enough) that the needs vary so much that a
single, hardcoded system won't suffice. And we can try to get the "right"
behaviour in each queue, just like now.

Walking the tree might make the scheduler not fully O(1) - but per default
only one queue is defined (or possibly two queues, one for kernel threads),
and everything else can be done by user-space.


The mentioned case of a web-server with gzip started would be done with having
each httpd being in a queue just below init, and having everything else in
another - or by nicing the webserver, as it's defined as "important".
(I believe that's called "moving policy into userspace" :-)


Regards,

Phil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/