Re: scheduler scalability - cgroups, cpusets and load-balancing

From: Paul Jackson
Date: Tue Jan 29 2008 - 14:04:24 EST


Gregory wrote:
> IMHO it works well the way it is: The user selects the class for a
> particular task using sched_setscheduler(), and they select the cpuset
> (or inherit it) that defines its execution scope. If that scope has
> balancing enabled, the policy for the member classes is in effect.

Ok.

For the various classes of schedulers (sched_class's), it's fine by me
if sched domains are polymorphic, supporting all classes, and it is
left to each task to self-select the scheduling class of its preference.

For the batch scheduler case, this -must- be imposable from outside
the task, by the batch scheduler that is overseeing the job, and it
must support the batch scheduler being able to disable all the
balancers in selected cpusets (selected sched_domains).

We have that now. Each of us only knew of part of the solution,
but we managed to arrive at the desired answer even so ... amazing.

The batch scheduler just has to arrange to get 'sched_load_balance'
turned off in a cpuset and all overlapping cpusets, and then the
CPUS in that cpuset will not belong to -any- sched_domain, and hence
(could you verify I'm right in this detail?) won't be balanced by any
sched_class.

I should update the documentation for sched_load_balance, changing it
from saying that you get realtime by turning off sched_load_balance in
the RT cpuset, to saying that you get realtime by (1) turning off
sched_load_balance in any overlapping cpusets, including all
encompassing parent cpusets, (2) leaving sched_load_balance on in the
RT cpuset itself, and (3) having those realtime tasks each self-select
(elect) the desired SCHED_* using sched_setscheduler().

Condition (1) above is a tad difficult to understand, but servicable,
I guess. The combination of (1) and (2) results in a separate
sched_domain just for the CPUs in the RT cpuset.

> (on this topic, note that I do not know if the RT-balancer will
> respect the cpuset concept of "balance-enabled" anyway. That might
> have to be fixed)

Er eh ... it has no choice. If the user space code has configured a
cpuset with 'sched_load_balance' turned off in that cpuset and all
overlapping cpusets, then there will not even be a sched_domain
covering those CPUs, and hence no balancer, RT or other class, will
even see those CPUs.

Unless I really don't understand the kernel/sched.c sched_domain code
(a distinct possibility), if some CPU is not in any sched_domain, then
it won't get balanced, RT or otherwise.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@xxxxxxx> 1.940.382.4214
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/