Re: Documenting the ioctl interfaces to discover relationships between namespaces
From: Michael Kerrisk (man-pages)
Date: Thu Dec 15 2016 - 05:06:46 EST
On 12/15/2016 01:46 AM, Andrei Vagin wrote:
> On Sun, Dec 11, 2016 at 12:54:56PM +0100, Michael Kerrisk (man-pages) wrote:
>> [was: [PATCH 0/4 v3] Add an interface to discover relationships
>> between namespaces]
>>
>> Hello Andrei
>>
>> See below for my attempt to document the following.
>
> Hi Michael,
>
> Eric already did my work:). I have read this documentation and it looks
> good for me. I have nothing to add to Eric's comments.
Thanks, Andrei!
Cheers,
Michael
>>
>> On 6 September 2016 at 09:47, Andrei Vagin <avagin@xxxxxxxxxx> wrote:
>>> From: Andrey Vagin <avagin@xxxxxxxxxx>
>>>
>>> Each namespace has an owning user namespace and now there is not way
>>> to discover these relationships.
>>>
>>> Pid and user namepaces are hierarchical. There is no way to discover
>>> parent-child relationships too.
>>>
>>> Why we may want to know relationships between namespaces?
>>>
>>> One use would be visualization, in order to understand the running
>>> system. Another would be to answer the question: what capability does
>>> process X have to perform operations on a resource governed by namespace
>>> Y?
>>>
>>> One more use-case (which usually called abnormal) is checkpoint/restart.
>>> In CRIU we are going to dump and restore nested namespaces.
>>>
>>> There [1] was a discussion about which interface to choose to determing
>>> relationships between namespaces.
>>>
>>> Eric suggested to add two ioctl-s [2]:
>>>> Grumble, Grumble. I think this may actually a case for creating ioctls
>>>> for these two cases. Now that random nsfs file descriptors are bind
>>>> mountable the original reason for using proc files is not as pressing.
>>>>
>>>> One ioctl for the user namespace that owns a file descriptor.
>>>> One ioctl for the parent namespace of a namespace file descriptor.
>>>
>>> Here is an implementaions of these ioctl-s.
>>>
>>> $ man man7/namespaces.7
>>> ...
>>> Since Linux 4.X, the following ioctl(2) calls are supported for
>>> namespace file descriptors. The correct syntax is:
>>>
>>> fd = ioctl(ns_fd, ioctl_type);
>>>
>>> where ioctl_type is one of the following:
>>>
>>> NS_GET_USERNS
>>> Returns a file descriptor that refers to an owning user namesâ
>>> pace.
>>>
>>> NS_GET_PARENT
>>> Returns a file descriptor that refers to a parent namespace.
>>> This ioctl(2) can be used for pid and user namespaces. For
>>> user namespaces, NS_GET_PARENT and NS_GET_USERNS have the same
>>> meaning.
>>>
>>> In addition to generic ioctl(2) errors, the following specific ones
>>> can occur:
>>>
>>> EINVAL NS_GET_PARENT was called for a nonhierarchical namespace.
>>>
>>> EPERM The requested namespace is outside of the current namespace
>>> scope.
>>>
>>> [1] https://lkml.org/lkml/2016/7/6/158
>>> [2] https://lkml.org/lkml/2016/7/9/101
>>
>> The following is the text I propose to add to the namespaces(7) page.
>> Could you please review and let me know of corrections and
>> improvements.
>>
>> Thanks,
>>
>> Michael
>>
>>
>> Introspecting namespace relationships
>> Since Linux 4.9, two ioctl(2) operations are provided to allow
>> introspection of namespace relationships (see user_namespaces(7)
>> and pid_namespaces(7)). The form of the calls is:
>>
>> ioctl(fd, request);
>>
>> In each case, fd refers to a /proc/[pid]/ns/* file.
>>
>> NS_GET_USERNS
>> Returns a file descriptor that refers to the owning user
>> namespace for the namespace referred to by fd.
>>
>> NS_GET_PARENT
>> Returns a file descriptor that refers to the parent namesâ
>> pace of the namespace referred to by fd. This operation is
>> valid only for hierarchical namespaces (i.e., PID and user
>> namespaces). For user namespaces, NS_GET_PARENT is synonyâ
>> mous with NS_GET_USERNS.
>>
>> In each case, the returned file descriptor is opened with O_RDONLY
>> and O_CLOEXEC (close-on-exec).
>>
>> By applying fstat(2) to the returned file descriptor, one obtains
>> a stat structure whose st_ino (inode number) field identifies the
>> owning/parent namespace. This inode number can be matched with
>> the inode number of another /proc/[pid]/ns/{pid,user} file to
>> determine whether that is the owning/parent namespace.
>>
>> Either of these ioctl(2) operations can fail with the following
>> error:
>>
>> EPERM The requested namespace is outside of the caller's namesâ
>> pace scope. This error can occur if, for example, the ownâ
>> ing user namespace is an ancestor of the caller's current
>> user namespace. It can also occur on attempts to obtain
>> the parent of the initial user or PID namespace.
>>
>> Additionally, the NS_GET_PARENT operation can fail with the folâ
>> lowing error:
>>
>> EINVAL fd refers to a nonhierarchical namespace.
>>
>> See the EXAMPLE section for an example of the use of these operaâ
>> tions.
>>
>> [...]
>>
>> EXAMPLE
>> The example shown below uses the ioctl(2) operations described
>> above to perform simple introspection of namespace relationships.
>> The following shell sessions show various examples of the use of
>> this program.
>>
>> Trying to get the parent of the initial user namespace fails, for
>> the reasons explained earlier:
>>
>> $ ./ns_introspect /proc/self/ns/user p
>> The parent namespace is outside your namespace scope
>>
>> Create a process running sleep(1) that resides in new user and UTS
>> namespaces, and show that new UTS namespace is associated with the
>> new user namespace:
>>
>> $ unshare -Uu sleep 1000 &
>> [1] 23235
>> $ ./ns_introspect /proc/23235/ns/uts
>> Inode number of owning user namespace is: 4026532448
>> $ readlink /proc/23235/ns/user
>> user:[4026532448]
>>
>> Then show that the parent of the new user namespace in the precedâ
>> ing example is the initial user namespace:
>>
>> $ readlink /proc/self/ns/user
>> user:[4026531837]
>> $ ./ns_introspect /proc/23235/ns/user
>> Inode number of owning user namespace is: 4026531837
>>
>> Start a shell in a new user namespace, and show that from within
>> this shell, the parent user namespace can't be discovered. Simiâ
>> larly, the UTS namespace (which is associated with the initial
>> user namespace) can't be discovered.
>>
>> $ PS1="sh2$ " unshare -U bash
>> sh2$ ./ns_introspect /proc/self/ns/user p
>> The parent namespace is outside your namespace scope
>> sh2$ ./ns_introspect /proc/self/ns/uts u
>> The owning user namespace is outside your namespace scope
>>
>> Program source
>>
>> /* ns_introspect.c
>>
>> Licensed under GNU General Public License v2 or later
>> */
>> #include <stdlib.h>
>> #include <unistd.h>
>> #include <stdio.h>
>> #include <sys/stat.h>
>> #include <fcntl.h>
>> #include <sys/ioctl.h>
>> #include <string.h>
>> #include <errno.h>
>>
>> #ifndef NS_GET_USERNS
>> #define NSIO 0xb7
>> #define NS_GET_USERNS _IO(NSIO, 0x1)
>> #define NS_GET_PARENT _IO(NSIO, 0x2)
>> #endif
>>
>> int
>> main(int argc, char *argv[])
>> {
>> int fd, userns_fd, parent_fd;
>> struct stat sb;
>>
>> if (argc < 2) {
>> fprintf(stderr, "Usage: %s /proc/[pid]/ns/[file] [p|u]\n",
>> argv[0]);
>> fprintf(stderr, "\nDisplay the result of one or both "
>> "of NS_GET_USERNS (u) or NS_GET_PARENT (p)\n"
>> "for the specified /proc/[pid]/ns/[file]. If neither "
>> "'p' nor 'u' is specified,\n"
>> "NS_GET_USERNS is the default.\n");
>> exit(EXIT_FAILURE);
>> }
>>
>> /* Obtain a file descriptor for the 'ns' file specified
>> in argv[1] */
>>
>> fd = open(argv[1], O_RDONLY);
>> if (fd == -1) {
>> perror("open");
>> exit(EXIT_FAILURE);
>> }
>>
>> /* Obtain a file descriptor for the owning user namespace and
>> then obtain and display the inode number of that namespace */
>>
>> if (argc < 3 || strchr(argv[2], 'u')) {
>> userns_fd = ioctl(fd, NS_GET_USERNS);
>>
>> if (userns_fd == -1) {
>> if (errno == EPERM)
>> printf("The owning user namespace is outside "
>> "your namespace scope\n");
>> else
>> perror("ioctl-NS_GET_USERNS");
>> exit(EXIT_FAILURE);
>> }
>>
>> if (fstat(userns_fd, &sb) == -1) {
>> perror("fstat-userns");
>> exit(EXIT_FAILURE);
>> }
>> printf("Inode number of owning user namespace is: %ld\n",
>> (long) sb.st_ino);
>>
>> close(userns_fd);
>> }
>>
>> /* Obtain a file descriptor for the parent namespace and
>> then obtain and display the inode number of that namespace */
>>
>> if (argc > 2 && strchr(argv[2], 'p')) {
>> parent_fd = ioctl(fd, NS_GET_PARENT);
>>
>> if (parent_fd == -1) {
>> if (errno == EINVAL)
>> printf("Can' get parent namespace of a "
>> "nonhierarchical namespace\n");
>> else if (errno == EPERM)
>> printf("The parent namespace is outside "
>> "your namespace scope\n");
>> else
>> perror("ioctl-NS_GET_PARENT");
>> exit(EXIT_FAILURE);
>> }
>>
>> if (fstat(parent_fd, &sb) == -1) {
>> perror("fstat-parentns");
>> exit(EXIT_FAILURE);
>> }
>> printf("Inode number of parent namespace is: %ld\n",
>> (long) sb.st_ino);
>>
>> close(parent_fd);
>> }
>>
>> exit(EXIT_SUCCESS);
>> }
>>
>>
>> --
>> Michael Kerrisk
>> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
>> Linux/UNIX System Programming Training: http://man7.org/training/
>
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/