Re: Detecting if you are running in a container

From: Eric W. Biederman
Date: Tue Nov 01 2011 - 09:37:48 EST


"H. Peter Anvin" <hpa@xxxxxxxxx> writes:

> On 10/16/2011 02:42 AM, Eric W. Biederman wrote:
>>>
>>> Something based on UUIDs, perhaps?
>>>
>>> UUIDs are kind of exactly this, after all... a single namespace designed
>>> to be large and random enough to be globally unique without a central
>>> registration authority.
>>
>> mount --bind /proc/self/ns/net /var/run/netns/<name>
>>
>> When we want to refer to the namespace in syscalls we pass a file
>> descriptor we received from opening the namespace reference object.
>>
>> That moves the entire naming problem into the file namespace.
>>
>
> That doesn't solve what I think of as the *real* problem.

It solves the problem of not needing a namespace of namespaces and
it solves the problem not requiring universal agreement between all
filesystems on all operating systems on how things should look.

In not precluding different solutions it makes a large stride forward.

> The real problem is just another instance of what I sometimes refer to
> as the "alien metadata problem": the alien metadata problem (which crops
> up in *all kinds* of contexts, including containers, namespaces, virtual
> machines, building distribution disk images, and backups) is the fact
> that you would like to be able to store, manipulate and preserve, on
> disk and in a mounted filesystem, a set of metadata which may not be the
> "currently active" metadata.

When you throw network filesystems with different notions of meta-data
things get even more interesting.

> There are two forms of "solutions" to this: one where the filesystem
> still only contains one set of metadata, but it is not currently active,
> and one where the filesystem contains multiple sets of metadata for the
> same files at the same time, any one of which can be active (and
> different ones may be active for different namespaces.)

There is an important tool that seems to be missing from your toolbox.
- Mapping the metadata on the file into different contexts.

The way I see it classic unix filesystems have exactly one context
that their meta-data is expected to work in. The context in which
the filesystem is mounted.

However it is very easy to conceive of that context being specified
at a per inode granularity. In which case at least the backup and
the distribution disk image problem can be solved by trivially
specifying a different context, and associating a user namespace with
that context. Then you switch into the user namespace to manipulate
"alien metadata".

Where mapping comes in is when those files are accessed from
from another context besides the one where all of their metadata
falls. At which point you can map all of the files to be owned
by the user who is responsible for making backups. The mapping
is a bit like the root squash setting.


For the common case I expect we will settle on a well defined acl across
the native unix filesystems that allows us to make this persistent. For
network filesystems with their broader interoperability requirements how
to specify this gets a little more interesting.

For purposes of implementation it doesn't matter to me if that acl is
a uuid or a unique string. For management of the data it might.

How I expect a native linux filesystem to work when it encounters a
filesystem with a user namespace acl is that it will work like nfsv4
and do an upcall into userspace, to ask the appropriate userspace
how do I understand this acl. The the userapce mapping agent will
say. Oh. You want the usernamespace for "hpa-backups"? Let's see:
/var/run/userns/hpa-backups exists let me just tell the kernel about
that mapping. Or perhaps the usernamespace does not exist so the
mapping daemon would go out and create it be consulting configuration
files in etc to know that everything in "hpa-backups" should a child
user namespace with the user "hpa" being able to switch into that
usernamespace without root permission.

Files with meta-data for more than one usernamespace/context I expect
to work similarly. Care needs to be take that it doesn't drive the
administrator crazy.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/