Re: exclusive cpusets broken with cpu hotplug

From: Nick Piggin
Date: Thu Oct 19 2006 - 04:17:39 EST


Paul Jackson wrote:
So that depends on what cpusets asks for. If, when setting up a and
b, it asks to partition the domains, then yes that breaks the parent
cpuset gets broken.


That probably makes good sense from the sched domain side of things.

It is insanely counterintuitive from the cpuset side of things.

Using heirarchical cpuset properties to drive this is the wrong
approach.

In the general case, looking at it (as best I can) from the sched
domain side of things, it seems that the sched domain could be
defined on a system as follows.

Partition the CPUs on the system - into one or more subsets
(partitions), non-overlapping, and covering.

Each of those partitions can either have a sched domain setup on
it, to support load balancing across the CPUs in that partition,
or can be isolated, with no load balancing occuring within that
partition.

No load balancing occurs across partitions.

Correct. But you don't have to treat isolated CPUs differently - they
are just the degenerate case of a partition of 1 CPU. I assume cpusets
could create similar "isolated" domains where no balancing takes place.

Using cpu_exclusive cpusets for this is next to impossible. It could
be approximated perhaps by having just the immediate children of the
root cpuset, /dev/cpuset/*, define the partition.

Fine.

But if any lower level cpusets have any affect on the partitioning,
by setting their cpu_exclusive flag in the current implementation,
it is -always- the case, by the basic structure of the cpuset
hierarchy, that the lower level cpuset is a subset of its parents
cpus, and that that parent also has cpu_exclusive set.

The resulting partitioning, even in such simple examples as above, is
not obvious. If you look back a couple days, when I first presented
essentially this example, I got the resulting sched domain partitioning
entirely wrong.

The essential detail in my big patch of yesterday, to add new specific
sched_domain flags to cpusets, is that it -removed- the requirement to
mark a parent as defining a sched domain anytime a child defined one.

That requirement is one of the defining properties of the cpu_exclusive
flag, and makes that flag -outrageously- unsuited for defining sched
domain partitions.

So make the new rule "cpu_exclusive && direct-child-of-root-cpuset".
Your problems go away, and they haven't been pushed to userspace.

If a user wants to, for some crazy reason, have a set of cpu_exclusive
sets deep in the cpuset hierarchy, such that no load balancing happens
between them... just tell them they can't; they should just make those
cpusets children of the root.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com -
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/