Re: [RFC PATCH 0/4] namespacefs: Proof-of-Concept

From: James Bottomley
Date: Fri Nov 19 2021 - 07:45:11 EST


On Thu, 2021-11-18 at 14:24 -0500, Steven Rostedt wrote:
> On Thu, 18 Nov 2021 12:55:07 -0600
> ebiederm@xxxxxxxxxxxx (Eric W. Biederman) wrote:
>
> > It is not correct to use inode numbers as the actual names for
> > namespaces.
> >
> > I can not see anything else you can possibly uses as names for
> > namespaces.
>
> This is why we used inode numbers.
>
> > To allow container migration between machines and similar things
> > the you wind up needing a namespace for your names of namespaces.
>
> Is this why you say inode numbers are incorrect?

The problem is you seem to have picked on one orchestration system
without considering all the uses of namespaces and how this would
impact them. So let me explain why inode numbers are incorrect and it
will possibly illuminate some of the cans of worms you're opening.

We have a container checkpoint/restore system called CRIU that can be
used to snapshot the state of a pid subtree and restore it. It can be
used for the entire system or piece of it. It is also used by some
orchestration systems to live migrate containers. Any property of a
container system that has meaning must be saved and restored by CRIU.

The inode number is simply a semi random number assigned to the
namespace. it shows up in /proc/<pid>/ns but nowhere else and isn't
used by anything. When CRIU migrates or restores containers, all the
namespaces that compose them get different inode values on the restore.
If you want to make the inode number equivalent to the container name,
they'd have to restore to the previous number because you've made it a
property of the namespace. The way everything is set up now, that's
just not possible and never will be. Inode numbers are a 32 bit space
and can't be globally unique. If you want a container name, it will
have to be something like a new UUID and that's the first problem you
should tackle.

James