Re: [PATCH v5] cgroup: Add new capability to allow a process to migrate other tasks between cgroups

From: John Stultz
Date: Tue Dec 13 2016 - 11:50:32 EST

On Tue, Dec 13, 2016 at 8:39 AM, Casey Schaufler <casey@xxxxxxxxxxxxxxxx> wrote:
> On 12/13/2016 1:47 AM, Michael Kerrisk (man-pages) wrote:
>> Hi John,
>> On 13 December 2016 at 02:39, John Stultz <john.stultz@xxxxxxxxxx> wrote:
>>> This patch adds CAP_GROUP_MIGRATE and logic to allows a process
>>> to migrate other tasks between cgroups.
>>> In Android (where this feature originated), the ActivityManager
>>> tracks various application states (TOP_APP, FOREGROUND,
>>> BACKGROUND, SYSTEM, etc), and then as applications change
>>> states, the SchedPolicy logic will migrate the application tasks
>>> between different cgroups used to control the different
>>> application states (for example, there is a background cpuset
>>> cgroup which can limit background tasks to stay on one low-power
>>> cpu, and the bg_non_interactive cpuctrl cgroup can then further
>>> limit those background tasks to a small percentage of that one
>>> cpu's cpu time).
>>> However, for security reasons, Android doesn't want to make the
>>> system_server (the process that runs the ActivityManager and
>>> SchedPolicy logic), run as root. So in the Android common.git
>>> kernel, they have some logic to allow cgroups to loosen their
>>> permissions so CAP_SYS_NICE tasks can migrate other tasks between
>>> cgroups.
>>> I feel the approach taken there overloads CAP_SYS_NICE a bit much
>>> for non-android environments. Efforts to re-use CAP_SYS_RESOURCE
>>> for this purpose (which Android has since adopted) was also
>>> stymied by concerns about risks from future cgroups that could be
>>> considered "dangerous" by how they might change system semantics.
>>> So to avoid overlapping usage, this patch adds a brand new
>>> process capability flag (CAP_CGROUP_MIGRATE), and uses it when
>>> checking if a task can migrate other tasks between cgroups.
>>> I've tested this with AOSP master (though its a bit hacked in as
>>> I still need to properly get the selinux bits aware of the new
>>> capability bit) with selinux set to permissive and it seems to be
>>> working well.
>>> Thoughts and feedback would be appreciated!
>> So, back to the discussion of silos. I understand the argument for
>> wanting a new silo. But, in that case can we at least try not to make
>> it a single-use silo?
>> How about CAP_CGROUP_CONTROL or some such, with the idea that this
>> might be a capability that allows the holder to step outside usual
>> cgroup rules? At the moment, that capability would allow only one such
>> step, but maybe there would be others in the future.
> I agree, but want to put it more strongly. The granularity of
> capabilities can never be fine enough for some people, and this
> is an example of a case where you're going a bit too far. If the
> use case is Android as you say, you don't need this. As my friends
> on the far side of the aisle would say, "just write SELinux policy"
> to correctly control access as required.

So.. The trouble is that while selinux is good for restricting
permissions, the in-kernel permission checks here are already too
restrictive. It seems one must first loosen things up before we can
tighten it with selinux rules. Or are you suggesting the system_server
run as root + further selinux limitations? I worry, the Android
developers may still be hesitant to do that.