Re: [RFC] cpuset: remove sched domain hooks from cpusets

From: Paul Jackson
Date: Fri Oct 20 2006 - 15:19:47 EST


Suresh wrote:
> I like the direction of Nick's patch which do domain partitioning
> at the top-most exclusive cpuset.

See the reply I just posted to Nick on this.

His patch didn't partition at the top cpuset, but at its children.
It could not have done any better than that.

The top cpuset covers all online cpus on the system, which is the
same as the default sched domain partition. Partitioning there
would be a no-op, producing the same one big partition we have now.

Partitioning at any lower level, even just the immediate children
of the root cpuset as Nick's patch does, breaks load balancing for
any tasks in the top cpuset.

And even if for some strange reason that weren't a problem, still
partitioning at the level of the immediate children of the root cpuset
doesn't help much on a decent proportion of big systems. Many of my
big systems run with just two cpusets right under the top cpuset, a
tiny cpuset (say 4 cpus) for classic Unix daemons, cron jobs and init,
and a huge (say 1020 out of 1024 cpus) cpuset for the batch scheduler
to slice and dice, to sub-divide into smaller cpusets for the various
jobs and other needs it has.

These systems would still suffer from any performance problems we had
balancing a huge sched domain. Presumably the pain of balancing a
1020 cpu partition is not much less than it is for a 1024 cpu partition.

So, regrettably, Nick's patch is both broken and useless ;).

Only a finer grain sched domain partitioning, that accurately reflects
the placement of active jobs and tasks needing load balancing, is of
much use here.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@xxxxxxx> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/