Re: [PATCH PLACEHOLDER 1/3] fs/exec: "always_unprivileged" patch

From: Will Drewry
Date: Sat Jan 14 2012 - 14:21:14 EST


On Sat, Jan 14, 2012 at 7:30 AM, Jamie Lokier <jamie@xxxxxxxxxxxxx> wrote:
> Linus Torvalds wrote:
>> On Thu, Jan 12, 2012 at 5:11 PM, Andrew Lutomirski <luto@xxxxxxx> wrote:
>> >
>> > What if you're a daemon that needs something like CAP_NET_BIND but
>> > also wants to be able to run other helpers without CAP_NET_BIND?
>> >
>> > (Also, preventing dropping of privileges will probably make a patch
>> > more complicted -- I'll have to find and update all the places that
>> > allow dropping privileges.)
>>
>> Hey, if it actually makes it more complicated to say "don't change
>> privileges", then I guess my argument that it should be simpler is
>> wrong.
>>
>> That said, the thing you bring up is *not* the actual use-case for the
>> suggestion. The use-case is a "run untrusted code". So the use-case
>> would be to set the flag after you've dropped CAP_NET_BIND, and
>> *before* you actually run the other helpers. You clearly must have a
>> fork() or something like that there, since you want to keep the
>> NET_BIND in the original daemon.
>
> Well suppose you don't trust the daemon either.  It might be running
> in a network namespace where it's safe for untrusted code to bind to
> low ports.
>
> Or maybe you just need to let it bind willy-nilly among a restricted
> subset of low ports - which of course you would like to restrict with
> the seccomp filter.

Unless the port values are the register arguments, seccomp filter
won't help. It can be used to incrementally drop available system
calls (like socketcall(SYS_LISTEN) or whatever).

> (This can't happen right now because the filter can only look at
> arguments, not memory pointed to - so it can't look at the port
> number.  Can it even see when sys_bind is called on archs like x86
> that use sys_socketcall?!)

Yeah - multiplexed system calls like ipc and socketcall can be filtered
based on the argument value in the register. (socketcall's first argument is
"call".)

> Anyway the principle is there - CAP_NET_BIND doesn't necessarily mean
> the daemon code is trusted.

I think we're comparing apples to oranges. I believe the current proposal is a
bit that says "hey! I'm sandboxed!". Defensive programming that is often
achieved through continued reduction of capabilities is important, but
orthogonal. In that model, only once the last vestige of "privilege" is dropped
would the process set the no_new_privs bit. Until then, you rely on the
other access contol pieces you've put in place: namespacing, etc.

While I am a fan of capabilities systems, it would be very cool to have a
bottom floor, privilege-freezer which could help against some classes of
sandbox escapes.

cheers!
will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/