Re: [CRIU] Introspecting userns relationships to other namespaces?

From: Michael Kerrisk (man-pages)
Date: Fri Jul 08 2016 - 07:18:22 EST


On 07/08/2016 05:26 AM, James Bottomley wrote:
On Thu, 2016-07-07 at 20:00 -0700, Andrew Vagin wrote:
On Thu, Jul 07, 2016 at 07:16:18PM -0700, Andrew Vagin wrote:
On Thu, Jul 07, 2016 at 12:17:35PM -0700, James Bottomley wrote:
On Thu, 2016-07-07 at 20:21 +0200, Michael Kerrisk (man-pages)
wrote:
On 7 July 2016 at 17:01, James Bottomley
<James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
[Serge already answered the parenting issue]
On Thu, 2016-07-07 at 08:36 -0500, Serge E. Hallyn wrote:
Hm. Probably best-effort based on the process hierarchy.
So
yeah you could probably get a tree into a state that would
be
wrongly recreated. Create a new netns, bind mount it, exit;
Have
another task create a new user_ns, bind mount it, exit;
Third
task setns()s first to the new netns then to the new
user_ns. I
suspect criu will recreate that wrongly.

This is a bit pathological, and you have to be root to do it:
so
root can set up a nesting hierarchy, bind it and destroy the
pids
but I know of no current orchestration system which does
this.

Actually, I have to back pedal a bit: the way I currently set
up
architecture emulation containers does precisely this: I set
up the
namespaces unprivileged with child mount namespaces, but then
I ask
root to bind the userns and kill the process that created it
so I
have a permanent handle to enter the namespace by, so I
suspect
that when our current orchestration systems get more
sophisticated,
they might eventually want to do something like this as well.

In theory, we could get nsfs to show this information as an
option
(just add a show_options entry to the superblock ops), but
the
problem is that although each namespace has a parent user_ns,
there's no way to get it without digging in the namespace
specific
structure. Probably we should restructure to move it into
ns_common, then we could display it (and enforce all
namespaces
having owning user_ns) but it would be a

I'm missing something here. Is it not already the case that all
namespaces have an owning user_ns?

Um, yes, I don't believe I said they don't. The problem I
thought you
were having is that there's no way of seeing what it is.

nsfs is the Namespace fileystem where bound namespaces appear to
a cat
of /proc/self/mounts. It can display any information that's in
ns_common (the common core of namespaces) but the owning user_ns
pointer currently isn't in this structure. Every user namespace
has a
pointer to it, but they're all privately embedded in the
individual
namespace specific structures. What I was proposing was that
since
every current namespace has a pointer somewhere to the owning
user
namespace, we could abstract this out into ns_common so it's now
accessible to be displayed by nsfs, probably as a mount option.

James, I am not sure that I understood you correctly. We have one
file system for all namespace files, how we can show per-file
properties
in mount options. I think we can show all required information in
fdinfo. We open a namespaces file (/proc/pid/ns/N) and then read
/proc/pid/fdinfo/X for it.

Here is a proof-of-concept patch.

How it works:

In [1]: import os

In [2]: fd = os.open("/proc/self/ns/pid", os.O_RDONLY)

In [3]: print open("/proc/self/fdinfo/%d" % fd).read()
pos: 0
flags: 0100000
mnt_id: 2
userns: 4026531837

In [4]: print "/proc/self/ns/user -> %s" %
os.readlink("/proc/self/ns/user")
/proc/self/ns/user -> user:[4026531837]

can't you just do

readlink /proc/self/ns/user | sed 's/.*\[\(.*\)\]/\1/'

?

But what Michael was asking about was the parent user_ns of all the
other namespaces ...

Just to reiterate, what I'm interested in is the introspection use
case (but there's clearly several other interesting use cases here).
The idea is to be able to answer these questions

1. For each userns, what is the parent of that userns?

2. For each non-user namespace, what is the owning userns?

This enables us to understand the userns hierarchy, which
matters in terms of answering the question: what capabilities
does process X have in namespace Y?
Cheers,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/