Re: [PATCH] capabilities: add capability cgroup controller

From: Topi Miettinen
Date: Sat Jul 02 2016 - 07:21:25 EST


On 06/28/16 04:57, Eric W. Biederman wrote:
> Topi Miettinen <toiwoton@xxxxxxxxx> writes:
>
>> On 06/24/16 17:21, Eric W. Biederman wrote:
>>> "Serge E. Hallyn" <serge@xxxxxxxxxx> writes:
>>>
>>>> Quoting Tejun Heo (tj@xxxxxxxxxx):
>>>>> Hello,
>>>>>
>>>>> On Fri, Jun 24, 2016 at 10:59:16AM -0500, Serge E. Hallyn wrote:
>>>>>> Quoting Tejun Heo (tj@xxxxxxxxxx):
>>>>>>> But isn't being recursive orthogonal to using cgroup? Why not account
>>>>>>> usages recursively along the process hierarchy? Capabilities don't
>>>>>>> have much to do with cgroup but everything with process hierarchy.
>>>>>>> That's how they're distributed and modified. If monitoring their
>>>>>>> usages is necessary, it makes sense to do it in the same structure.
>>>>>>
>>>>>> That was my argument against using cgroups to enforce a new bounding
>>>>>> set. For tracking though, the cgroup process tracking seems as applicable
>>>>>> to this as it does to systemd tracking of services. It tracks a task and
>>>>>> the children it forks.
>>>>>
>>>>> Just monitoring is less jarring than implementing security enforcement
>>>>> via cgroup, but it is still jarring. What's wrong with recursive
>>>>> process hierarchy monitoring which is in line with the whole facility
>>>>> is implemented anyway?
>>>>
>>>> As I think Topi pointed out, one shortcoming is that if there is a short-lived
>>>> child task, using its /proc/self/status is racy. You might just miss that it
>>>> ever even existed, let alone that the "application" needed it.
>>>>
>>>> Another alternative we've both mentioned is to use systemtap. That's not
>>>> as nice a solution as a cgroup, but then again this isn't really a common
>>>> case, so maybe it is precisely what a tracing infrastructure is meant for.
>>>
>>> Hmm.
>>>
>>> We have capability use wired up into auditing. So we might be able to
>>> get away with just adding an appropriate audit message in
>>> commoncap.c:cap_capable that honors the audit flag and logs an audit
>>> message. The hook in selinux already appears to do that.
>>>
>>> Certainly audit sounds like the subsystem for this kind of work, as it's
>>> whole point in life is logging things, then something in userspace can
>>> just run over the audit longs and build a nice summary.
>>
>> Even simpler would be to avoid the complexity of audit subsystem and
>> just printk() when a task starts using a capability first time (not on
>> further uses by same task). There are not that many capability bits nor
>> privileged processes, meaning not too many log entries. I know as this
>> was actually my first approach. But it's also far less user friendly
>> than just reading a summarized value which could be directly fed back to
>> configuration.
>
> Your loss.
>
>> Logging/auditing approach also doesn't work well for other things I'd
>> like to present meaningful values for the user. For example, consider
>> RLIMIT_AS, where my goal is also to enable the users to be able to
>> configure this limit for a service. Should there be an audit message
>> whenever the address space limit grows (i.e. each mmap())? What about
>> when it shrinks? For RLIMIT_NOFILE we'd have to report each
>> open()/close()/dup()/socket()/etc. and track how many are opened at the
>> same time. I think it's better to store the fully cooked (meaningful to
>> user) value in kernel and present it only when asked.
>
> That doesn't have anything to do with anything.
>
> My suggestion was very much to do with capabilities which are already
> logged with the audit subsystem with selinux. The idea was to move
> those audit calls into commoncap where they arguably belong allow anyone
> to use them for anything.
>
> That is a non-controversial code cleanup that happens to cover your
> special case. That is enough to build a tool in userspace that will
> tell you which capabilities you need without penalizing the kernel, or
> the vast majority of everyone who does not use your feature.
>
> From what I have seen of this conversation there is not and will not be
> one interface to rule them all.

Now that I know taskstats better, it looks like a good choice for most
of the highwater marks, complemented with audit logging. The taskstats
interface is only available to privileged processes but that's OK. I'll
make new patches based on this approach.

-Topi


>
> Eric
>