Re: [RFC] Virtualization steps

From: Eric W. Biederman
Date: Fri Mar 31 2006 - 00:58:49 EST


Chris Wright <chrisw@xxxxxxxxxxxx> writes:

> * Eric W. Biederman (ebiederm@xxxxxxxxxxxx) wrote:
>> As I currently understand the problem everything goes along nicely
>> nothing really special needed until you start asking the question
>> how do I implement a root user with uid 0 who does not own the
>> machine. When you start asking that question is when the creepy
>> crawlies come out.
>
> Hehe. uid 0 _and_ full capabilities. So reducing capabilities is one
> relatively easy way to handle that.

It comes close the but capabilities are not currently factored correctly.

> And, if you have a security module
> loaded it's going to use security labels, which can be richer than both
> uid and capabilites combined.

Exactly. You can define the semantics with a security module,
but you cannot define the semantics in terms of uids.

>> On most virtual filesystems the default owner of files is uid 0.
>> Additional privilege checks are not applied. Writing to those
>> files could potentially have global effect.
>
> Yes, many (albeit far from all) have a capable() check as well.

Nothing controlled by sysctl has a capable check, except
the capabilities sysctl. The default if not the norm is not
to apply capability checks.

>> It is completely unclear how permissions checks should work
>> between two processes in different uid namespaces. Especially
>> there are cases where you do want interactions.
>
> Are there? Why put them in different containers then? I'd think
> network sockets is the extent of the interaction you'd want. Sharing
> filesystem does leave room for named pipes and unix domain sockets (also
> in the abstract namespace). And considering the side channel in unix
> domain sockets, they become a potential hole. So for solid isolation,
> I'd expect disallowing access to those when the object owner is in a
> different security context from context which is trying to attach.

Yes. My current implementation has all of that visibility closed,
when you create a new network namespace. But there are still
interactions. For me it isn't a real problem though as I have
a single system administrator and synchronized user ids. For
other use case it is a different story.

In a more normal use case, the container admin can't get out, but
the box admin can get in. At least for simple things like monitoring
and possibly some debugging.

Or you get weird cases where you want to allow access to some of
the files in /proc to the container but not all.

If I am the machine admin and I have discovered a process in
a container it has a bug and is going wild, it is preferable
to kill that process, or possibly that container rather than
rebooting the box to solve the problem.

All of the normal every day interactions get handled fine and there
is simply no visibility. But I don't ever expect perfect isolation,
from the machine admin.

I do still need to read up on the selinux mandatory access controls.
Although the comment from the NSA selinux FAQ about selinux being
just a proof-of-concept and no security bugs were discovered or
looked for during it's implementation scares me.


Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/