Re: [PATCH 00/23] proc: Introduce /proc/namespaces/ directory to expose namespaces lineary
From: Eric W. Biederman
Date: Mon Aug 17 2020 - 14:54:11 EST
Creating names in the kernel for namespaces is very difficult and
problematic. I have not seen anything that looks like all of the
problems have been solved with restoring these new names.
When your filter for your list of namespaces is user namespace creating
a new directory in proc is highly questionable.
As everyone uses proc placing this functionality in proc also amplifies
the problem of creating names.
Rather than proc having a way to mount a namespace filesystem filter by
the user namespace of the mounter likely to have many many fewer
problems. Especially as we are limiting/not allow new non-process
things and ideally finding a way to remove the non-process things.
Kirill you have a good point that taking the case where a pid namespace
does not exist in a user namespace is likely quite unrealistic.
Kirill mentioned upthread that the list of namespaces are the list that
can appear in a container. Except by discipline in creating containers
it is not possible to know which namespaces may appear in attached to a
process. It is possible to be very creative with setns, and violate any
constraint you may have. Which means your filtered list of namespaces
may not contain all of the namespaces used by a set of processes. This
further argues that attaching the list of namespaces to proc does not
make sense.
Andrei has a good point that placing the names in a hierarchy by
user namespace has the potential to create more freedom when
assigning names to namespaces, as it means the names for namespaces
do not need to be globally unique, and while still allowing the names
to stay the same.
To recap the possibilities for names for namespaces that I have seen
mentioned in this thread are:
- Names per mount
- Names per user namespace
I personally suspect that names per mount are likely to be so flexibly
they are confusing, while names per user namespace are likely to be
rigid, possibly too rigid to use.
It all depends upon how everything is used. I have yet to see a
complete story of how these names will be generated and used. So I can
not really judge.
Let me add another take on this idea that might give this work a path
forward. If I were solving this I would explore giving nsfs directories
per user namespace, and a way to mount it that exposed the directory of
the mounters current user namespace (something like btrfs snapshots).
Hmm. For the user namespace directory I think I would give it a file
"ns" that can be opened to get a file handle on the user namespace.
Plus a set of subdirectories "cgroup", "ipc", "mnt", "net", "pid",
"user", "uts") for each type of namespace. In each directory I think
I would just have a 64bit counter and each new entry I would assign the
next number from that counter.
The restore could either have the ability to rename files or simply the
ability to bump the counter (like we do with pids) so the names of the
namespaces can be restored.
That winds up making a user namespace the namespace of namespaces, so
I am not 100% about the idea.
Eric