Documenting the ioctl interfaces to discover relationships between namespaces

From: Michael Kerrisk (man-pages)
Date: Sun Dec 11 2016 - 06:55:26 EST


[was: [PATCH 0/4 v3] Add an interface to discover relationships
between namespaces]

Hello Andrei

See below for my attempt to document the following.

On 6 September 2016 at 09:47, Andrei Vagin <avagin@xxxxxxxxxx> wrote:
> From: Andrey Vagin <avagin@xxxxxxxxxx>
>
> Each namespace has an owning user namespace and now there is not way
> to discover these relationships.
>
> Pid and user namepaces are hierarchical. There is no way to discover
> parent-child relationships too.
>
> Why we may want to know relationships between namespaces?
>
> One use would be visualization, in order to understand the running
> system. Another would be to answer the question: what capability does
> process X have to perform operations on a resource governed by namespace
> Y?
>
> One more use-case (which usually called abnormal) is checkpoint/restart.
> In CRIU we are going to dump and restore nested namespaces.
>
> There [1] was a discussion about which interface to choose to determing
> relationships between namespaces.
>
> Eric suggested to add two ioctl-s [2]:
>> Grumble, Grumble. I think this may actually a case for creating ioctls
>> for these two cases. Now that random nsfs file descriptors are bind
>> mountable the original reason for using proc files is not as pressing.
>>
>> One ioctl for the user namespace that owns a file descriptor.
>> One ioctl for the parent namespace of a namespace file descriptor.
>
> Here is an implementaions of these ioctl-s.
>
> $ man man7/namespaces.7
> ...
> Since Linux 4.X, the following ioctl(2) calls are supported for
> namespace file descriptors. The correct syntax is:
>
> fd = ioctl(ns_fd, ioctl_type);
>
> where ioctl_type is one of the following:
>
> NS_GET_USERNS
> Returns a file descriptor that refers to an owning user namesâ
> pace.
>
> NS_GET_PARENT
> Returns a file descriptor that refers to a parent namespace.
> This ioctl(2) can be used for pid and user namespaces. For
> user namespaces, NS_GET_PARENT and NS_GET_USERNS have the same
> meaning.
>
> In addition to generic ioctl(2) errors, the following specific ones
> can occur:
>
> EINVAL NS_GET_PARENT was called for a nonhierarchical namespace.
>
> EPERM The requested namespace is outside of the current namespace
> scope.
>
> [1] https://lkml.org/lkml/2016/7/6/158
> [2] https://lkml.org/lkml/2016/7/9/101

The following is the text I propose to add to the namespaces(7) page.
Could you please review and let me know of corrections and
improvements.

Thanks,

Michael


Introspecting namespace relationships
Since Linux 4.9, two ioctl(2) operations are provided to allow
introspection of namespace relationships (see user_namespaces(7)
and pid_namespaces(7)). The form of the calls is:

ioctl(fd, request);

In each case, fd refers to a /proc/[pid]/ns/* file.

NS_GET_USERNS
Returns a file descriptor that refers to the owning user
namespace for the namespace referred to by fd.

NS_GET_PARENT
Returns a file descriptor that refers to the parent namesâ
pace of the namespace referred to by fd. This operation is
valid only for hierarchical namespaces (i.e., PID and user
namespaces). For user namespaces, NS_GET_PARENT is synonyâ
mous with NS_GET_USERNS.

In each case, the returned file descriptor is opened with O_RDONLY
and O_CLOEXEC (close-on-exec).

By applying fstat(2) to the returned file descriptor, one obtains
a stat structure whose st_ino (inode number) field identifies the
owning/parent namespace. This inode number can be matched with
the inode number of another /proc/[pid]/ns/{pid,user} file to
determine whether that is the owning/parent namespace.

Either of these ioctl(2) operations can fail with the following
error:

EPERM The requested namespace is outside of the caller's namesâ
pace scope. This error can occur if, for example, the ownâ
ing user namespace is an ancestor of the caller's current
user namespace. It can also occur on attempts to obtain
the parent of the initial user or PID namespace.

Additionally, the NS_GET_PARENT operation can fail with the folâ
lowing error:

EINVAL fd refers to a nonhierarchical namespace.

See the EXAMPLE section for an example of the use of these operaâ
tions.

[...]

EXAMPLE
The example shown below uses the ioctl(2) operations described
above to perform simple introspection of namespace relationships.
The following shell sessions show various examples of the use of
this program.

Trying to get the parent of the initial user namespace fails, for
the reasons explained earlier:

$ ./ns_introspect /proc/self/ns/user p
The parent namespace is outside your namespace scope

Create a process running sleep(1) that resides in new user and UTS
namespaces, and show that new UTS namespace is associated with the
new user namespace:

$ unshare -Uu sleep 1000 &
[1] 23235
$ ./ns_introspect /proc/23235/ns/uts
Inode number of owning user namespace is: 4026532448
$ readlink /proc/23235/ns/user
user:[4026532448]

Then show that the parent of the new user namespace in the precedâ
ing example is the initial user namespace:

$ readlink /proc/self/ns/user
user:[4026531837]
$ ./ns_introspect /proc/23235/ns/user
Inode number of owning user namespace is: 4026531837

Start a shell in a new user namespace, and show that from within
this shell, the parent user namespace can't be discovered. Simiâ
larly, the UTS namespace (which is associated with the initial
user namespace) can't be discovered.

$ PS1="sh2$ " unshare -U bash
sh2$ ./ns_introspect /proc/self/ns/user p
The parent namespace is outside your namespace scope
sh2$ ./ns_introspect /proc/self/ns/uts u
The owning user namespace is outside your namespace scope

Program source

/* ns_introspect.c

Licensed under GNU General Public License v2 or later
*/
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/ioctl.h>
#include <string.h>
#include <errno.h>

#ifndef NS_GET_USERNS
#define NSIO 0xb7
#define NS_GET_USERNS _IO(NSIO, 0x1)
#define NS_GET_PARENT _IO(NSIO, 0x2)
#endif

int
main(int argc, char *argv[])
{
int fd, userns_fd, parent_fd;
struct stat sb;

if (argc < 2) {
fprintf(stderr, "Usage: %s /proc/[pid]/ns/[file] [p|u]\n",
argv[0]);
fprintf(stderr, "\nDisplay the result of one or both "
"of NS_GET_USERNS (u) or NS_GET_PARENT (p)\n"
"for the specified /proc/[pid]/ns/[file]. If neither "
"'p' nor 'u' is specified,\n"
"NS_GET_USERNS is the default.\n");
exit(EXIT_FAILURE);
}

/* Obtain a file descriptor for the 'ns' file specified
in argv[1] */

fd = open(argv[1], O_RDONLY);
if (fd == -1) {
perror("open");
exit(EXIT_FAILURE);
}

/* Obtain a file descriptor for the owning user namespace and
then obtain and display the inode number of that namespace */

if (argc < 3 || strchr(argv[2], 'u')) {
userns_fd = ioctl(fd, NS_GET_USERNS);

if (userns_fd == -1) {
if (errno == EPERM)
printf("The owning user namespace is outside "
"your namespace scope\n");
else
perror("ioctl-NS_GET_USERNS");
exit(EXIT_FAILURE);
}

if (fstat(userns_fd, &sb) == -1) {
perror("fstat-userns");
exit(EXIT_FAILURE);
}
printf("Inode number of owning user namespace is: %ld\n",
(long) sb.st_ino);

close(userns_fd);
}

/* Obtain a file descriptor for the parent namespace and
then obtain and display the inode number of that namespace */

if (argc > 2 && strchr(argv[2], 'p')) {
parent_fd = ioctl(fd, NS_GET_PARENT);

if (parent_fd == -1) {
if (errno == EINVAL)
printf("Can' get parent namespace of a "
"nonhierarchical namespace\n");
else if (errno == EPERM)
printf("The parent namespace is outside "
"your namespace scope\n");
else
perror("ioctl-NS_GET_PARENT");
exit(EXIT_FAILURE);
}

if (fstat(parent_fd, &sb) == -1) {
perror("fstat-parentns");
exit(EXIT_FAILURE);
}
printf("Inode number of parent namespace is: %ld\n",
(long) sb.st_ino);

close(parent_fd);
}

exit(EXIT_SUCCESS);
}


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/