Re: For review (v2): user_namespaces(7) man page

From: richard -rw- weinberger
Date: Fri Apr 26 2013 - 01:48:16 EST

On Fri, Apr 26, 2013 at 2:54 AM, Eric W. Biederman
<ebiederm@xxxxxxxxxxxx> wrote:
> richard -rw- weinberger <richard.weinberger@xxxxxxxxx> writes:
>> On Wed, Mar 27, 2013 at 10:26 PM, Michael Kerrisk (man-pages)
>> <mtk.manpages@xxxxxxxxx> wrote:
>>> Inside the user namespace, the shell has user and group ID 0,
>>> and a full set of permitted and effective capabilities:
>>> bash$ cat /proc/$$/status | egrep '^[UG]id'
>>> Uid: 0 0 0 0
>>> Gid: 0 0 0 0
>>> bash$ cat /proc/$$/status | egrep '^Cap(Prm|Inh|Eff)'
>>> CapInh: 0000000000000000
>>> CapPrm: 0000001fffffffff
>>> CapEff: 0000001fffffffff
>> I've tried your demo program, but inside the new ns I'm automatically nobody.
>> As Eric said, setuid(0)/setgid(0) are missing.
> Is it the setuid/setgid or not setting up the uid/gid map?

uid/git mapping are set up.

>> Eric, maybe you can help me. How can I drop capabilities within a user
>> namespace?
>> In childFunc() I did add prctl(PR_CAPBSET_DROP, CAP_NET_ADMIN) but it always
>> returns ENOPERM.
>> What that? I thought I get a completely fresh set of cap which I can modify.
>> I don't want that uid 0 inside the container has all caps.
> There are weird things that happen with exec and the user namespace. If
> you have exec'd as an unmapped user all of your capabilities have
> already been droped.

I've setup the mappings. If I look into /proc/*/status I see that my process has
all caps.
So, in general it is possible to drop cap within a user namespace?
I really want to drop CAP_NET_ADMIN and some others.
root within my container must not change any networking settings.

>> And why does /proc/*/loginuid always contain 4294967295 in a new user namespace?
>> Writing to it also fails. (Noticed that because does not work).
> Almost certainly because the loginuid has already been set. Yes. It
> looks like I am simply using from_kuid instead of from_kuid_munged on
> the read. So an unmapped loginuid will be reported as 4294967295.
> For some circumstances 65534 (nobody) is definitely better in some it is
> a toss up, and most of the time no one really cares. So I have tried to
> do something but in this case I don't know which was the best policy.

Hmm, I hoped that loginuid will be reset upon entering a user namespace.

>> Final question, is it by design that uid 0 within a namespace in not
>> allowed to write to
>> /proc/*/oom_score_adj?
> Essentially. It is by design that uid 0 within a namespace be mapped to
> some other uid outside the namespace, and that the permissions on writes
> should use the permission needed outside of the user namespace.

Okay, I've asked because systemd is a heavy user of this file and
fails due to this
within a user namespace.
Luckily it is possible to remove all the score changes from the .service files.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at