Re: [RFC][PATCH 0/9] Make containers kernel objects

From: David Howells
Date: Tue May 23 2017 - 12:13:37 EST


Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:

> Let me suggest a concrete alternative:
>
> - At the time of mount observer the mounters user namespace.

Looking at sget(), I don't think a mounter can see a superblock outside of
their namespace. There is something icky in there whereby all automounts are
currently transferred into the init_user_ns though (something to fix in my
mount-context series) :-/

> - Find the mounters pid namespace.
> - If the mounters pid namespace is owned by the mounters user namespace
> walk up the pid namespace tree to the first pid namespace owned by
> that user namespace.
> - If the mounters pid namespace is not owned by the mounters user
> namespace fail the mount it is going to need to make upcalls as
> will not be possible.

Take the following scenario:

(1) Create a process with a new network namespace. Set up the network to
route out of ethernet port 1.

(2) Create a child process with new network and user namespaces. Set up the
network to route out of ethernet port 2.

(3) Mount an NFS volume in the process created in (2).

The mount in (3) will fail unconditionally.

> - Hold a reference to the pid namespace that was found.

Take the following scenario:

(1) Create a process with new network and pid namespaces. Set up the network
to route out of ethernet port 1.

(2) Create a child process with new network and pid namespaces. Set up the
network to route out of ethernet port 2.

(3) Mount an NFS volume in the process created in (2).

(4) Create another child process with new network and pid namespaces. Set up
the network to route out of ethernet port 3.

(5) In the process created in (4), access the NFS volume created in (3).

The user namespace is the same all the way through.

Now you're holding a ref to the pid namespace created in (1) - but that is of
no use to you. The upcall must take place in the network namespace that
routes out through port 2.

David