Re: [Lse-tech] [PATCH] cpusets - big numa cpu and memory placement

From: Martin J. Bligh
Date: Sun Oct 03 2004 - 19:05:32 EST


--"Martin J. Bligh" <mbligh@xxxxxxxxxxx> wrote (on Sunday, October 03, 2004 16:53:40 -0700):

>> Martin wrote:
>>> Matt had proposed having a separate sched_domain tree for each cpuset, which
>>> made a lot of sense, but seemed harder to do in practice because "exclusive"
>>> in cpusets doesn't really mean exclusive at all.
>>
>> See my comments on this from yesterday on this thread.
>>
>> I suspect we don't want a distinct sched_domain for each cpuset, but
>> rather a sched_domain for each of several entire subtrees of the cpuset
>> hierarchy, such that every CPU is in exactly one such sched domain, even
>> though it be in several cpusets in that sched_domain.
>
> Mmmm. The fundamental problem I think we ran across (just whilst pondering,
> not in code) was that some things (eg ... init) are bound to ALL cpus (or
> no cpus, depending how you word it); i.e. they're created before the cpusets
> are, and are a member of the grand-top-level-uber-master-thingummy.
>
> How do you service such processes? That's what I meant by the exclusive
> domains aren't really exclusive.
>
> Perhaps Matt can recall the problems better. I really liked his idea, aside
> from the small problem that it didn't seem to work ;-)
>
>> So we have eight cpusets, non-overlapping and covering the entire
>> system, each with its own sched_domain.
>
> But that's the problem ... I think there are *always* cpusets that overlap.
> Which is sad (fixable?) because it breaks lots of intelligent things we
> could do.

Hmmm. What if when you created a new, exclusive CPUset, the cpus you spec'ed
were *removed* from the parent CPUset (and existing processes forcibly
migrated off). That'd fix most of it, and would bring us much closer to the
true meaning of "exclusive". Changes your semantics a bit, but still ...

OK, so there is one problem I can see - you couldn't remove the last CPU
from the parent if there were any jobs running in it, but presumably fixable
(eg you have to move them into the created child, or fail the call).

M.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/