Re: [ckrm-tech] [RFC] Resource Management - Infrastructure choices

From: Paul Menage
Date: Tue Oct 31 2006 - 11:46:42 EST


On 10/31/06, Srivatsa Vaddagiri <vatsa@xxxxxxxxxx> wrote:
For the case where resource node hierarchy is different from process
container hierarchy, I am trying to make sense of "why do we need to
maintain two hierarchies" - one the actual hierarchy used for resource
control purpose, another the process container hierarchy. What purpose
does maintaining the process container hierarchy (in addition to the
resource controller hierarchy) solve?

The idea is that in general, people aren't going to want to have
separate hierarchies for different resources - they're going to have
the hierarchies be the same for all resources. So in general when they
move a process from one container to another, they're going to want to
move that task to use all the new resources limits/guarantees
simultaneously.

Having completely independent hierarchies makes this more difficult -
you have to manually maintain multiple different hierarchies from
userspace. Suppose a task forks while you're moving it from one
container to another? With the approach that each process is in one
container, and each container is in a set of resource nodes, at least
the child task is either entirely in the new resource limits or
entirely in the old limits - if userspace has to update several
hierarchies at once non-atomically then a freshly forked child could
end up with a mixture of resource nodes.


I am thinking we can avoid maintaining these two hierarchies, by
something on these lines:

mkdir /dev/cpu
mount -t container -ocpu container /dev/cpu

-> Represents a hierarchy for cpu control purpose.

tsk->cpurc = represent the node in the cpu
controller hierarchy. Also maintains
resource allocation information for
this node.


If we were going to do something like this, hopefully it would look
more like an array of generic container subsystems, rather than a
separate named pointer for each subsystem.


mkdir /dev/mem
mount -t container -omem container /dev/mem

-> Represents a hierarchy for mem control purpose.

tsk->memrc = represent the node in the mem
controller hierarchy. Also maintains
resource allocation information for
this node.

tsk->memrc->parent = parent node.


mkdir /dev/containers
mount -t container -ocontainer container /dev/container

-> Represents a (mostly flat?) hierarchy for the real
container (virtualization) purpose.

I think we have an overloading of terminology here. By "container" I
just mean "group of processes tracked for resource control and other
purposes". Can we use a term like "virtual server" if you're doing
virtualization? I.e. a virtual server would be a specialization of a
container (effectively analagous to a resource controller)


I suspect this may simplify the "container" filesystem, since it doesnt
have to track multiple hierarchies at the same time, and improve lock
contention too (modifying the cpu controller hierarchy can take a different
lock than the mem controller hierarchy).

Do you think that lock contention when modifying hierarchies is
generally going to be an issue - how often do tasks get moved around
in the hierarchy, compared to the other operations going on on the
system?

Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/