Re: [Lse-tech] [PATCH] cpusets - big numa cpu and memory placement

From: Paul Jackson
Date: Tue Oct 05 2004 - 21:14:59 EST


Martin writes:
> I agree with the basic partitioning stuff - and see a need for that. The
> non-exclusive stuff I think is fairly obscure, and unnecessary complexity
> at this point, as 90% of it is covered by CKRM. It's Andrew and Linus's
> decision, but that's my input.

Now you're trying to marginalize non-exclusive cpusets as a fringe
requirement. Thanks a bunch ;).

Instead of requiring complete exclusion for all cpusets, and pointing to
the current 'exclusive' flag as the wrong flag at the wrong place at the
wrong time (sorry - my radio is turned to the V.P. debate in the
background) how about let's being clear what sort of exclusion the
schedulers, the allocators and here the resource manager (CKRM) require.

I can envision dividing a machine into a few large, quite separate,
'soft' partitions, where each such partition is represented by a subtree
of the cpuset hierarchy, and where there is no overlap of CPUs, Memory
Nodes or tasks between the 'soft' partitions, even though there is a
possibly richly nested cpuset (cpu and memory affinity) structure within
any given 'soft' partition.

Nothing would cross 'soft' partition boundaries. So far as CPUs, Memory
Nodes, Tasks and their Affinity, the 'soft' partitions would be
separate, isolated, and non-overlapping.

Each such 'soft' partition could host a separate instance (domain) of
the scheduler, allocator, and resource manager. Any such domain would
know what set of CPUs, Memory Nodes and Tasks it was managing, and would
have complete and sole control of the scheduling, allocation or resource
sharing of those entities.

But also within a 'soft' partition, there would be finer grain placement,
finer grain CPU and Memory affinity, whether by the current tasks
cpus_allowed and mems_allowed, or by some improved mechanism that the
schedulers, allocators and resource managers could better deal with.

There _has_ to be. Even if cpusets, sched_setaffinity, mbind, and
set_mempolicy all disappeared tomorrow, you still have the per-cpu
kernel threads that have to be placed to a tighter specification than
the whole of such a 'soft' partition.

Could you or some appropriate CKRM guru please try to tell me what
isolation you actually need for CKRM. Matthew or Peter please do the
same for the schedulers.

In particular, do you need to prohibit any finer grained placement
within a particular domain, or not. I believe not. Is it not the case
that what you really need is that the cpusets that correspond to one of
your domains (my 'soft' partitions, above) be isolated from any other
such 'soft' partition? Is it not the case that further, finer grained
placement within such an isolated 'soft' partition is acceptable? Sure
better be. Indeed, that's pretty much what we have now, with what
amounts to a single domain covering the entire system.

Instead of throwing out half of cpusets on claims that it conflicts
with the requirements of the schedulers, resource managers or (not yet
raised) the allocators, please be more clear as to what the actual
requirements are.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@xxxxxxx> 1.650.933.1373
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/