[RFC] Resource Management - Infrastructure choices

From: Srivatsa Vaddagiri
Date: Mon Oct 30 2006 - 05:29:27 EST


Over the last couple of months, we have seen a number of proposals for
resource management infrastructure/controllers and also good discussions
surrounding those proposals. These discussions has resulted in few
consensus points and few other points that are still being debated.

This RFC is an attempt to:

o summarize various proposals to date for infrastructure

o summarize consensus/debated points for infrastructure

o (more importantly) get various stakeholders agree on what is a good
compromise for infrastructure in going forward

Couple of questions that I am trying to address in this RFC:

- Do we wait till controllers are worked out before merging
any infrastructure?

IMHO, its good if we can merge some basic infrastructure now
and incrementally enhance it and add controllers based on it.
This perspective leads to the second question below ..

- Paul Menage's patches present a rework of existing code, which makes
it simpler to get it in. Does it meet container (Openvz/Linux
VServer) and resource management requirements?

Paul has ported over the CKRM code on top of his patches. So I
am optimistic that it meets resource management requirements in
general.

One shortcoming I have seen in it is that it lacks an
efficient method to retrieve tasks associated with a group.
This may be needed by few controllers implementations if they
have to support, say, change of resource limits. This however
I think could be addressed quite easily (by a linked list
hanging off each container structure).

Resource Management - Goals
---------------------------

Develop mechanisms for isolating use of shared resources like cpu, memory
between various applications. This includes:

- mechanism to group tasks by some attribute (ex: containers,
CKRM/RG class, cpuset etc)

- mechanism to monitor and control usage of a variety of resources by
such groups of tasks

Resources to be managed:

- Memory, CPU and disk I/O bandwidth (of high interest perhaps)
- network bandwidth, number of tasks/file-descriptors/sockets etc.


Proposals to date for infrastructure
------------------------------------

- CKRM/RG
- UBC
- Container implementation (by Paul Menage) based on generalization of
cpusets.


A. Class-based Kernel Resource Management/Resource Groups

Framework to monitor/control use of various resources by a group of
tasks as per specified guarantee/limits.

Provides a config-fs based interface to:

- create/delete task-groups
- allow a task to change its (or some other task's) association
from one group to other (provided it has the right
privileges). New children of the affected task inherit the
same group association.
- list tasks present in a group (A group can exist without any
tasks associated with it)
- specify group's min/max use of various resources. A special
value "DONT_CARE" specifies that the group doesn't care for
how much resource it gets.
- obtain resource usage statistics
- Supports heirarchy depth of 1 (??)

In addition to this user-interface, it provides a framework for
controllers to:

- register/deregister themselves
- be intimated about changes in resource allocation for a group
- be intimated about task movements between groups
- be intimated about creation/deletion of groups
- know which group a task belongs to

B. UBC

Framework to account and limit usage of various resources by a
container (group of tasks).

Provides a system call based interface to:

- set a task's beancounter id. If the id does not exist, a new
beancounter object is created
- change a task's association from one beancounter to other
- return beancounter id to which the calling task belongs
- set limits of consumption of a particular resource by a
beancounter
- return statistics information for a given beancounter and
resource.


Provides a framework for controllers to:

- register various resources
- lookup beancounter object given a particular id
- charge/uncharge usage of some resource to a beancounter by
some amount
- also know if the resulting usage is above the allowed
soft/hard limit.
- change a task's accounting beancounter (usefull in, say,
interrupt handling)
- know when the resource limits change for a beancounter

C. Paul Menage's container patches

Provides a generic heirarchial process grouping mechanism based on
cpusets, which can be used for resource management purposes.

Provides a filesystem-based interface to:

- create/destroy containers
- change a task's association from one container to other
- retrieve all the tasks associated with a container
- know which container a task belongs to (from /proc)
- know when the last task belonging to a container has exited


Consensus/Debated Points
------------------------

Consensus:

- Provide resource control over a group of tasks
- Support movement of task from one resource group to another
- Dont support heirarchy for now
- Support limit (soft and/or hard depending on the resource
type) in controllers. Guarantee feature could be indirectly
met thr limits.

Debated:
- syscall vs configfs interface
- Interaction of resource controllers, containers and cpusets
- Should we support, for instance, creation of resource
groups/containers under a cpuset?
- Should we have different groupings for different resources?
- Support movement of all threads of a process from one group
to another atomically?

--
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/