Re: [RFC] Implement ambient capability set.

From: Andy Lutomirski
Date: Wed Feb 04 2015 - 16:24:52 EST


On Wed, Feb 4, 2015 at 1:16 PM, Serge E. Hallyn <serge@xxxxxxxxxx> wrote:
> Quoting Andy Lutomirski (luto@xxxxxxxxxxxxxx):
>> On Wed, Feb 4, 2015 at 10:49 AM, Christoph Lameter <cl@xxxxxxxxx> wrote:
>> > An attempt to implement this. Probably missing some fine points:
>> >
>> > Subject: [capabilities] Implement ambient capability set.
>> >
>> > DRAFT -- untested -- DRAFT
>> >
>> > Implement an ambient capabilty set to allow capabilties
>> > to be inherited with unix semantics used also for other
>> > attributes.
>> >
>> > Implements PR_CAP_AMBIENT. The second argument to prctl
>> > is a the capability number and the third the desired state.
>> > 0 for off. Otherwise on.
>> >
>> > Serge:
>> > A new capability set, pA, is empty by default. You can
>> > add bits to it using prctl if ns_capable(CAP_SETPCAP) and
>> > all the new bits are in your pE. Once set, they stay until
>> > they are removed using prctl. At exec, pA' = pA, and
>> > fI |= pA (after reading fI from disk but before
>> > calculating pI').
>> >
>> > Since the ambient caps "stay on" cap_inheritable does not
>> > really matter anymore. Simply set the permitted caps when
>> > the ambient cap is set.
>> >
>> > Signed-off-by: Christoph Lameter <cl@xxxxxxxxx>
>> >
>> > Index: linux/security/commoncap.c
>> > ===================================================================
>> > --- linux.orig/security/commoncap.c 2015-02-04 09:44:25.000000000 -0600
>> > +++ linux/security/commoncap.c 2015-02-04 12:48:44.100471600 -0600
>> > @@ -353,7 +353,7 @@ static inline int bprm_caps_from_vfs_cap
>> > /*
>> > * pP' = (X & fP) | (pI & fI)
>>
>> For a final patch, this comment should be fixed.
>>
>> > */
>> > - new->cap_permitted.cap[i] =
>> > + new->cap_permitted.cap[i] = current_cred()->cap_ambient.cap[i] |
>> > (new->cap_bset.cap[i] & permitted) |
>> > (new->cap_inheritable.cap[i] & inheritable);
>>
>> IMO dropping a bit from the permitted set and then calling execve
>> while ruid != 0 MUST NOT re-add that bit to the permitted set. It
>> doesn't in current kernels, and changing that is a really bad idea, I
>> think.
>>
>> So either we need an assertion that cap_ambiant is a subset of
>> cap_permitted (and code to make that assertion hold) or we should
>> change the rule above. I like:
>>
>> (current_cred()->cap_ambient.cap[i] & permitted)
>>
>> although I'd also be okay w/ something closer to Andrew's suggestion,
>> as long as it had the "& permitted" part.
>>
>> >
>> > @@ -577,6 +577,7 @@ skip:
>> > }
>> >
>> > new->securebits &= ~issecure_mask(SECURE_KEEP_CAPS);
>> > + new->cap_ambient = old->cap_ambient;
>> > return 0;
>> > }
>> >
>> > @@ -933,6 +934,20 @@ int cap_task_prctl(int option, unsigned
>> > new->securebits &= ~issecure_mask(SECURE_KEEP_CAPS);
>> > return commit_creds(new);
>> >
>> > + case PR_CAP_AMBIENT:
>> > + if (!ns_capable(current_user_ns(), CAP_SETPCAP))
>> > + return -EPERM;
>>
>> I maintain my assertion that, if we allow root (or, worse, setuid
>> root) to do this, then someone will come up with the brilliant idea of
>> writing a program (perhaps called something like seunshare) that does
>> this, then drops all caps (except ambient caps) and runs something
>> untrusted. Then we lose. So I'd prefer to check
>
> Well it's sort of the point of this patchset. If you trusted the code
> being run, you could assign fI. If you can't be bothered to assign fI,
> then you may feel you *need* to run it, but you probably don't really
> trust the binary.
>
>> task_no_new_privs(current) instead of ns_capable(current_user_ns(),
>
> .... I'm ok with that. And iiuc it shouldn't get in the way of
> Christoph's use case. I'd just rather not have one set of convoluted
> new rules now, and the have to relax them later bc it turns out ppl
> needed that.
>
> Christoph, would your code run ok under NNP?

But someone will want to run *bash* as an untrusted user with, say,
CAP_NET_BIND permitted and ambient. Then that user has a non-empty
ambient set, and they can run a setuid-root program, and who knows
what will go wrong? Requiring no_new_privs would prevent this type of
failure entirely.

If we need to relax that later, it's easy, I think. The rule's not
that convoluted, and there's precedent for having new fancy features
require setting no_new_privs first.

>
>> CAP_SETPCAP). I could be talked out of this, though, but I'd want to
>> see a decent argument why.
>>
>> In fact, even with your proposal of writing a tool that does this and
>> then calls a helper, that helper might try to use privilege separation
>> and open a big hole because clearing pP is no longer sufficient to
>> drop privileges. Changing the evolution rule as above would fix this.
>
> Yeah... "because clearing pP is no longer sufficient to drop privileges"
> is reasonably convincing.
>
>> > +
>> > + if (!cap_valid(arg2))
>> > + return -EINVAL;
>> > +
>> > + new =prepare_creds();
>> > + if (arg3 == 0)
>> > + cap_lower(new->cap_ambient, arg2);
>> > + else
>> > + cap_raise(new->cap_ambient, arg2);
>> > + return commit_creds(new);
>> > +
>>
>> This let you add capabilities you don't even have to cap_ambient. I'm
>> fine with that as long as the cap evolution rule changes, as above.
>
> How about if instead we do restrict it to what's in pP? I don't
> want CAP_SETPCAP to become a cheap way to get all caps back. With
> or without NNP.

We'd also have to modify everything that can change pP to change pA as
well if we went this route. I'd be okay with that, but it would make
the patch much larger, and I'm not entirely sure I see the benefit.
It would keep the number of possible states smaller, which could be
nice.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/