Re: [Lse-tech] [PATCH] cpusets - big numa cpu and memory placement
From: Paul Jackson
Date: Thu Oct 07 2004 - 13:20:12 EST
Martin wrote:
>
> So we have the purely exclusive stuff, which needs kernel support in the form
> of sched_domains alterations. The rest of cpusets is just poking and prodding
> at cpus_allowed, the membind API, and the irq binding stuff. All of which
> you could do from userspace, without any further kernel support, right?
> Or am I missing something?
Well ... we're gaining. A couple of days ago you were suggesting
that cpusets could be replaced with some exclusive domains plus
CKRM.
Now it's some exclusive domains plus poking the affinity masks.
Yes - you're still missing something.
But I must keep in mind that I had concluded, perhaps three years ago,
just what you conclude now: that cpusets is just poking some affinity
masks, and that I could do most of it from user land. The result ended
up missing some important capabilities. User level code could not
manage collections of hardware nodes (sets of CPUs and Memory Nodes) in
a co-ordinated and controlled manner.
The users of cpusets need to have system wide names for them, with
permissions for viewing, modifying and attaching to them, and with the
ability to list both what hardware (CPUs and Memory) in a cpuset, and
what tasks are attached to a cpuset. As is usual in such operating
systems, the kernel manages such system wide synchronized controlled
access views.
As I quote below, I've been saying this repeatedly. Could you
tell me, Martin, whether the disconnect is:
1) that you didn't yet realize that cpusets provided this model (names,
permissions, ...) or
2) you don't think such a model is useful, or
3) you think that such a model can be provided sensibly from user space?
If I knew this, I could focus my response better.
The rest of this message is just quotes from this last week - many
can stop reading here.
===
Date: Fri, 1 Oct 2004 23:06:44 -0700
From: Paul Jackson <pj@xxxxxxx>
Even the flat model (no hierarchy) uses require some way to
name and control access to cpusets, with distinct permissions
for examining, attaching to, and changing them, that can be
used and managed on a system wide basis.
===
Date: Sat, 2 Oct 2004 12:14:30 -0700
From: Paul Jackson <pj@xxxxxxx>
And our customers _do_ want to manage these logically isolated
chunks as named "virtual computers" with system managed permissions
and integrity (such as the system-wide attribute of "Exclusive"
ownership of a CPU or Memory by one cpuset, and a robust ability
to list all tasks currently in a cpuset).
===
Date: Sat, 2 Oct 2004 19:26:03 -0700
From: Paul Jackson <pj@xxxxxxx>
Consider the following use case scenario, which emphasizes this
isolation aspect (and ignores other requirements, such as the need for
system admins to manage cpusets by name [some handle valid across
process contexts], with a system wide imposed permission model and
exclusive use guarantees, and with a well defined system supported
notion of which tasks are "in" which cpuset at any point in time).
===
Date: Sun, 3 Oct 2004 18:41:24 -0700
From: Paul Jackson <pj@xxxxxxx>
SGI makes heavy and critical use of the cpuset facilities on both Irix
and Linux that have been developed since pset. These facilities handle
both cpu and memory placment, and provide the essential kernel support
(names and permissions and operations to query, modify and attach) for a
system wide administrative interface for managing the resulting sets of
CPUs and Memory Nodes.
===
Date: Tue, 5 Oct 2004 02:17:36 -0700
From: Paul Jackson <pj@xxxxxxx>
To: "Martin J. Bligh" <mbligh@xxxxxxxxxxx>
The /dev/cpuset pseudo file system api was chosen because it was
convenient for small scale work, learning and experimentation, because
it was a natural for the hierarchical name space with permissions that I
required, and because it was convenient to leverage existing vfs
structure in the kernel.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@xxxxxxx> 1.650.933.1373
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/