Re: For review: user_namespace(7) man page
From: Michael Kerrisk (man-pages)
Date: Tue Sep 09 2014 - 10:01:03 EST
Hi Andy, and Eric,
On 09/01/2014 01:57 PM, Andy Lutomirski wrote:
> On Wed, Aug 20, 2014 at 4:36 PM, Michael Kerrisk (man-pages)
> <mtk.manpages@xxxxxxxxx> wrote:
>> Hello Eric et al.,
>>
>> For various reasons, my work on the namespaces man pages
>> fell off the table a while back. Nevertheless, the pages have
>> been close to completion for a while now, and I recently restarted,
>> in an effort to finish them. As you also noted to me f2f, there have
>> been recently been some small namespace changes that you may affect
>> the content of the pages. Therefore, I'll take the opportunity to
>> send the namespace-related pages out for further (final?) review.
>>
>> So, here, I start with the user_namespaces(7) page, which is shown
>> in rendered form below, with source attached to this mail. I'll
>> send various other pages in follow-on mails.
>>
>> Review comments/suggestions for improvements / bug fixes welcome.
>>
>> Cheers,
>>
>> Michael
>>
>> ==
>>
>> NAME
>> user_namespaces - overview of Linux user_namespaces
>>
>> DESCRIPTION
>> For an overview of namespaces, see namespaces(7).
>>
>> User namespaces isolate security-related identifiers and
>> attributes, in particular, user IDs and group IDs (see credenâ
>> tials(7), the root directory, keys (see keyctl(2)), and capabiliâ
>
> Putting "root directory" here is odd -- that's really part of a
> different namespace. But user namespaces sort of isolate the other
> namespaces from each other.
I'm trying to remember the details here. I think this piece originally
came after a discussion with Eric, but I am not sure. Eric?
> Also, ugh, keys. How did keyctl(2) ever make it through any kind of review?
>
>> ties (see capabilities(7)). A process's user and group IDs can
>> be different inside and outside a user namespace. In particular,
>> a process can have a normal unprivileged user ID outside a user
>> namespace while at the same time having a user ID of 0 inside the
>> namespace; in other words, the process has full privileges for
>> operations inside the user namespace, but is unprivileged for
>> operations outside the namespace.
>>
>> Nested namespaces, namespace membership
>> User namespaces can be nested; that is, each user namespaceâ
>> except the initial ("root") namespaceâhas a parent user namesâ
>> pace, and can have zero or more child user namespaces. The parâ
>> ent user namespace is the user namespace of the process that creâ
>> ates the user namespace via a call to unshare(2) or clone(2) with
>> the CLONE_NEWUSER flag.
>>
>> The kernel imposes (since version 3.11) a limit of 32 nested levâ
>> els of user namespaces. Calls to unshare(2) or clone(2) that
>> would cause this limit to be exceeded fail with the error EUSERS.
>>
>> Each process is a member of exactly one user namespace. A
>> process created via fork(2) or clone(2) without the CLONE_NEWUSER
>> flag is a member of the same user namespace as its parent. A
>> process can join another user namespace with setns(2) if it has
>> the CAP_SYS_ADMIN in that namespace; upon doing so, it gains a
>> full set of capabilities in that namespace.
>>
>> A call to clone(2) or unshare(2) with the CLONE_NEWUSER flag
>> makes the new child process (for clone(2)) or the caller (for
>> unshare(2)) a member of the new user namespace created by the
>> call.
>>
>> Capabilities
>> The child process created by clone(2) with the CLONE_NEWUSER flag
>> starts out with a complete set of capabilities in the new user
>> namespace. Likewise, a process that creates a new user namespace
>> using unshare(2) or joins an existing user namespace using
>> setns(2) gains a full set of capabilities in that namespace. On
>> the other hand, that process has no capabilities in the parent
>> (in the case of clone(2)) or previous (in the case of unshare(2)
>> and setns(2)) user namespace, even if the new namespace is creâ
>> ated or joined by the root user (i.e., a process with user ID 0
>> in the root namespace).
>>
>> Note that a call to execve(2) will cause a process to lose any
>> capabilities that it has, unless it has a user ID of 0 within the
>> namespace.
>
> Or unless file capabilities have a non-empty inheritable mask.
>
> It may be worth mentioning that execve in a user namespace works
> exactly like execve outside a userns.
I';ve reworded that para to say:
Note that a call to execve(2) will cause a process's capabiliâ
ties to be recalculated in the usual way (see capabilities(7)),
so that usually, unless it has a user ID of 0 within the namesâ
pace or the executable file has a nonempty inheritable capabilâ
ities mask, it will lose all capabilities. See the discussion
of user and group ID mappings, below.
Okay?
>
>> $ cat /proc/$$/uid_map
>> 0 0 4294967295
>>
>> This mapping tells us that the range starting at user ID 0 in
>> this namespace maps to a range starting at 0 in the (nonexistent)
>> parent namespace, and the length of the range is the largest
>> 32-bit unsigned integer.
>>
>> Defining user and group ID mappings: writing to uid_map and gid_map
>> After the creation of a new user namespace, the uid_map file of
>> one of the processes in the namespace may be written to once to
>> define the mapping of user IDs in the new user namespace. An
>> attempt to write more than once to a uid_map file in a user
>> namespace fails with the error EPERM. Similar rules apply for
>> gid_map files.
>>
>> The lines written to uid_map (gid_map) must conform to the folâ
>> lowing rules:
>>
>> * The three fields must be valid numbers, and the last field
>> must be greater than 0.
>>
>> * Lines are terminated by newline characters.
>>
>> * There is an (arbitrary) limit on the number of lines in the
>> file. As at Linux 3.8, the limit is five lines. In addition,
>> the number of bytes written to the file must be less than the
>> system page size, and the write must be performed at the start
>> of the file (i.e., lseek(2) and pwrite(2) can't be used to
>> write to nonzero offsets in the file).
>>
>> * The range of user IDs (group IDs) specified in each line canâ
>> not overlap with the ranges in any other lines. In the iniâ
>> tial implementation (Linux 3.8), this requirement was satisâ
>> fied by a simplistic implementation that imposed the further
>> requirement that the values in both field 1 and field 2 of
>> successive lines must be in ascending numerical order, which
>> prevented some otherwise valid maps from being created. Linux
>> 3.9 and later fix this limitation, allowing any valid set of
>> nonoverlapping maps.
>>
>> * At least one line must be written to the file.
>>
>> Writes that violate the above rules fail with the error EINVAL.
>>
>> In order for a process to write to the /proc/[pid]/uid_map
>> (/proc/[pid]/gid_map) file, all of the following requirements
>> must be met:
>>
>> 1. The writing process must have the CAP_SETUID (CAP_SETGID)
>> capability in the user namespace of the process pid.
>
> This checked for the opening process (and I don't actually remember
> whether it's checked for the writing process).
Eric, can you comment?
>>
>> 2. The writing process must be in either the user namespace of
>> the process pid or inside the parent user namespace of the
>> process pid.
>>
>> 3. The mapped user IDs (group IDs) must in turn have a mapping in
>> the parent user namespace.
>>
>> 4. One of the following is true:
>>
>> * The data written to uid_map (gid_map) consists of a single
>> line that maps the writing process's filesystem user ID
>> (group ID) in the parent user namespace to a user ID (group
>> ID) in the user namespace. The usual case here is that
>> this single line provides a mapping for user ID of the
>> process that created the namespace.
>>
>> * The process has the CAP_SETUID (CAP_SETGID) capability in
>> the parent user namespace. Thus, a privileged process can
>> make mappings to arbitrary user IDs (group IDs) in the parâ
>> ent user namespace.
>
> The opening process.
Fixed.
> One other thing that could be worth mentioning it: any non-user
> namespace that's created is owned by the user namespace of the process
> that created it at the time of creation. Actions on those namespaces
> require capabilities in the corresponding user namespace.
I added:
[[
When a non-user-namespace is created,
it is owned by the user namespace in which the creating process
was a member at the time of the creation of the namespace.
Actions on the non-user-namespace
require capabilities in the corresponding user namespace.
]]
> Thanks for doing this!
You're welcome. Thanks for the review!
Cheers,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/