Re: [CRIU] Introspecting userns relationships to other namespaces?

From: Michael Kerrisk (man-pages)
Date: Fri Jul 08 2016 - 07:18:22 EST

On 07/08/2016 05:26 AM, James Bottomley wrote:
On Thu, 2016-07-07 at 20:00 -0700, Andrew Vagin wrote:
On Thu, Jul 07, 2016 at 07:16:18PM -0700, Andrew Vagin wrote:
On Thu, Jul 07, 2016 at 12:17:35PM -0700, James Bottomley wrote:
On Thu, 2016-07-07 at 20:21 +0200, Michael Kerrisk (man-pages)
On 7 July 2016 at 17:01, James Bottomley
<James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
[Serge already answered the parenting issue]
On Thu, 2016-07-07 at 08:36 -0500, Serge E. Hallyn wrote:
Hm. Probably best-effort based on the process hierarchy.
yeah you could probably get a tree into a state that would
wrongly recreated. Create a new netns, bind mount it, exit;
another task create a new user_ns, bind mount it, exit;
task setns()s first to the new netns then to the new
user_ns. I
suspect criu will recreate that wrongly.

This is a bit pathological, and you have to be root to do it:
root can set up a nesting hierarchy, bind it and destroy the
but I know of no current orchestration system which does

Actually, I have to back pedal a bit: the way I currently set
architecture emulation containers does precisely this: I set
up the
namespaces unprivileged with child mount namespaces, but then
I ask
root to bind the userns and kill the process that created it
so I
have a permanent handle to enter the namespace by, so I
that when our current orchestration systems get more
they might eventually want to do something like this as well.

In theory, we could get nsfs to show this information as an
(just add a show_options entry to the superblock ops), but
problem is that although each namespace has a parent user_ns,
there's no way to get it without digging in the namespace
structure. Probably we should restructure to move it into
ns_common, then we could display it (and enforce all
having owning user_ns) but it would be a

I'm missing something here. Is it not already the case that all
namespaces have an owning user_ns?

Um, yes, I don't believe I said they don't. The problem I
thought you
were having is that there's no way of seeing what it is.

nsfs is the Namespace fileystem where bound namespaces appear to
a cat
of /proc/self/mounts. It can display any information that's in
ns_common (the common core of namespaces) but the owning user_ns
pointer currently isn't in this structure. Every user namespace
has a
pointer to it, but they're all privately embedded in the
namespace specific structures. What I was proposing was that
every current namespace has a pointer somewhere to the owning
namespace, we could abstract this out into ns_common so it's now
accessible to be displayed by nsfs, probably as a mount option.

James, I am not sure that I understood you correctly. We have one
file system for all namespace files, how we can show per-file
in mount options. I think we can show all required information in
fdinfo. We open a namespaces file (/proc/pid/ns/N) and then read
/proc/pid/fdinfo/X for it.

Here is a proof-of-concept patch.

How it works:

In [1]: import os

In [2]: fd ="/proc/self/ns/pid", os.O_RDONLY)

In [3]: print open("/proc/self/fdinfo/%d" % fd).read()
pos: 0
flags: 0100000
mnt_id: 2
userns: 4026531837

In [4]: print "/proc/self/ns/user -> %s" %
/proc/self/ns/user -> user:[4026531837]

can't you just do

readlink /proc/self/ns/user | sed 's/.*\[\(.*\)\]/\1/'


But what Michael was asking about was the parent user_ns of all the
other namespaces ...

Just to reiterate, what I'm interested in is the introspection use
case (but there's clearly several other interesting use cases here).
The idea is to be able to answer these questions

1. For each userns, what is the parent of that userns?

2. For each non-user namespace, what is the owning userns?

This enables us to understand the userns hierarchy, which
matters in terms of answering the question: what capabilities
does process X have in namespace Y?


Michael Kerrisk
Linux man-pages maintainer;
Linux/UNIX System Programming Training: