Re: For review (v2): user_namespaces(7) man page

From: Eric W. Biederman
Date: Thu Apr 25 2013 - 20:54:53 EST


richard -rw- weinberger <richard.weinberger@xxxxxxxxx> writes:

> On Wed, Mar 27, 2013 at 10:26 PM, Michael Kerrisk (man-pages)
> <mtk.manpages@xxxxxxxxx> wrote:
>> Inside the user namespace, the shell has user and group ID 0,
>> and a full set of permitted and effective capabilities:
>>
>> bash$ cat /proc/$$/status | egrep '^[UG]id'
>> Uid: 0 0 0 0
>> Gid: 0 0 0 0
>> bash$ cat /proc/$$/status | egrep '^Cap(Prm|Inh|Eff)'
>> CapInh: 0000000000000000
>> CapPrm: 0000001fffffffff
>> CapEff: 0000001fffffffff
>
> I've tried your demo program, but inside the new ns I'm automatically nobody.
> As Eric said, setuid(0)/setgid(0) are missing.

Is it the setuid/setgid or not setting up the uid/gid map?

> Eric, maybe you can help me. How can I drop capabilities within a user
> namespace?

> In childFunc() I did add prctl(PR_CAPBSET_DROP, CAP_NET_ADMIN) but it always
> returns ENOPERM.
> What that? I thought I get a completely fresh set of cap which I can modify.
> I don't want that uid 0 inside the container has all caps.

There are weird things that happen with exec and the user namespace. If
you have exec'd as an unmapped user all of your capabilities have
already been droped.

> And why does /proc/*/loginuid always contain 4294967295 in a new user namespace?
> Writing to it also fails. (Noticed that because pam_loginuid.so does not work).

Almost certainly because the loginuid has already been set. Yes. It
looks like I am simply using from_kuid instead of from_kuid_munged on
the read. So an unmapped loginuid will be reported as 4294967295.

For some circumstances 65534 (nobody) is definitely better in some it is
a toss up, and most of the time no one really cares. So I have tried to
do something but in this case I don't know which was the best policy.

> Final question, is it by design that uid 0 within a namespace in not
> allowed to write to
> /proc/*/oom_score_adj?

Essentially. It is by design that uid 0 within a namespace be mapped to
some other uid outside the namespace, and that the permissions on writes
should use the permission needed outside of the user namespace.

Which means there are all kinds of things only uid 0 can write to, that
you can't touch in a user namespace. Some of those things the policy
may need to be reconsidered. A lot of those things the default policy
is good. Regardless we are now defaulting to not letting root in a
container do risky things which is a good thing.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/