Re: cgroup: status-quo and userland efforts

From: Luke Leighton
Date: Tue Mar 03 2015 - 16:20:14 EST


Tejun Heo <tj@...> writes:


> I don't really understand your example anyway because you can classify
> by DTF / non-DTF first and then just propagate cpuset settings along.
> You won't lose anything that way, right?

without spoiling the fun by reading ahead, based on the extreme
complexity of what tim's team have spent probably man-decades
possibly even getting on for a man-century getting right, i'm guessing
two things: (a) that he will have said "we lose everything we
worked to achieve over the past few years" and (b) "what we have
now, whilst extremely complex, works really really well: why would
we even remotely contemplate changing / losing it / replacing it
with something that, from our deep level of expertise which we
seem unable to get across to you quite how complex it is, we *know*
will simply not possibly be adequate".

tim: the only thing i can suggest here which may help is that
you discuss seriously amongst the team as to whether to fork the
functionality present in the linux kernel re hierarchical cgroups,
and to maintain it indefinitely.


> I wrote about that many times, but here are two of the problems.
>
> * There's no way to designate a cgroup to a resource, because cgroup
> is only defined by the combination of who's looking at it for which
> controller. That's how you end up with tagging the same resource
> multiple times for different controllers and even then it's broken
> as when you move resources from one cgroup to another, you can't
> tell what to do with other tags.
>
> While allowing obscene level of flexibility, multiple hierarchies
> destroy a very fundamental concept that it *should* provide - that
> of a resource container. It can't because a "cgroup" is undefined
> under multiple hierarchies.

ok, there is an alternative to hierarchies, which has precedent
(and, importantly, a set of userspace management tools as well as
existing code in the linux kernel), and it's the FLASK model which
you know as SE/Linux.

whilst the majority of people view management to be "hierarchical"
(so there is a top dog or God process and everything trickles down
from that), this is viewed as such an anathema in the security
industry that someone came up with a formal specification for the
real-world way in which permissions are managed, and it's called the
FLASK model.

basically you have a security policy which may, in its extreme limits,
either contain absolutely all and any permissions (in the case of
SE/Linux that's quite literally every single system call), or it may
contain absolutely none.

*but* - and this is the key bit: when a process exec's a new one,
there is *no correlation* between the amount of permissions that the
new child process has and its parent. in other words, the security
policy *may* say that a parent may exec a process which has *more*
permissions (or even an entirely different set) than the parent.

in other words there *is* no hierarchy. it's all "flat", with
inter-relationships.

now, the way in which the security policy is expressed is in an m4
macro language that may contain wildcards and includes and macros and
functions and so on, meaning that its expression can be kept really
quite simple if properly managed (and the SE/Linux team do an
extraordinarily good job of doing exactly that).

basically the reason why i mention this, tejun, is because it has
distinct advantages. intuitively i am guessing that the reason why
you are freaking out about hierarchies is because it is effectively
potentially infinite depth. the reason why i mention SE/Linux is
because it is effectively completely flat, and the responsibility
for creating hierarchies (or not) is down to the userspace tools
that compile the m4 macros into the binary files that the kernel
reads and acts upon.

so i think you'll find that if you investigate this approach and
copy it, you should be able to keep the inherent simplicity of
a "unified" underlying approach, but not have tim's team freaking
out because they would be able to create policy files based on
a hierarchical arrangement.

it would also mean that policies could be written that ensure lxc
doesn't need to get rewritten; PID1 could be allocated specific
permissions that it can manage, and so on.

does that make any sense?

l.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/