Re: [RFC PATCH 0/4] namespacefs: Proof-of-Concept

From: Yordan Karadzhov
Date: Mon Nov 22 2021 - 10:00:39 EST




On 22.11.21 г. 15:44 ч., James Bottomley wrote:
Well, no, the information may not all exist. However, the point is we
can add it without adding additional namespace objects.

Let's look the following case (oversimplified just to get the idea):
1. The process X is a parent of the process Y and both are in
namespace 'A'.
3. "unshare" is used to place process Y (and all its child processes)
in a new namespace B (A is a parent namespace of B).
4. "setns" is s used to move process X in namespace C.

How would you find the parent namespace of B?
Actually this one's quite easy: the parent of X in your setup still has
it.

Hmm, Isn't that true only if somehow we know that (3) happened before (4).

However, I think you're looking to set up a scenario where the
namespace information isn't carried by live processes and that's
certainly possible if we unshare the namespace, bind it to a mount
point and exit the process that unshared it. If will exist as a bound
namespace with no processes until it gets entered via the binding and
when that happens the parent information can't be deduced from the
process tree.

There's another problem, that I think you don't care about but someone
will at some point: the owning user_ns can't be deduced from the
current tree either because it depends on the order of entry. We fixed
unshare so that if you enter multiple namespaces, it enters the user_ns
first so the latter is always the owning namespace, but if you enter
the rest of the namespaces first via one unshare then unshare the
user_ns second, that won't be true.

Neither of the above actually matter for docker like containers because
that's not the way the orchestration system works (it doesn't use mount
bindings or the user_ns) but one day, hopefully, it might.

Again, using your arguments, I can reformulate the problem statement
this way: a userspace program is well instrumented
to create an arbitrary complex tree of namespaces. In the same time,
the only place where the information about the
created structure can be retrieved is in the userspace program
itself. And when we have multiple userspace programs
adding to the namespaces tree, the global picture gets impossible to
recover.
So figure out what's missing in the /proc tree and propose adding it.
The interface isn't immutable it's just that what exists today is an
ABI and can't be altered. I think this is the last time we realised we
needed to add missing information in/proc/<pid>/ns:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=eaa0d190bfe1ed891b814a52712dcd852554cb08

So you can use that as the pattern.


OK, if everybody agrees that adding extra information to /proc is the right way to go, we will be happy to try developing another PoC that implements this approach.

Thank you very much for all your help!
Yordan

James