Re: [RFC][PATCH 0/9] Make containers kernel objects

From: Ian Kent
Date: Tue May 23 2017 - 06:10:06 EST


On Mon, 2017-05-22 at 17:22 +0100, David Howells wrote:
> Here are a set of patches to define a container object for the kernel and
> to provide some methods to create and manipulate them.
>
> The reason I think this is necessary is that the kernel has no idea how to
> direct upcalls to what userspace considers to be a container - current
> Linux practice appears to make a "container" just an arbitrarily chosen
> junction of namespaces, control groups and files, which may be changed
> individually within the "container".
>
> The kernel upcall mechanism then needs to decide which set of namespaces,
> etc., it must exec the appropriate upcall program.ÂÂExamples of this
> include:
>
> Â(1) The DNS resolver.ÂÂThe DNS cache in the kernel should probably be
> ÂÂÂÂÂper-network namespace, but in userspace the program, its libraries and
> ÂÂÂÂÂits config data are associated with a mount tree and a user namespace
> ÂÂÂÂÂand it gets run in a particular pid namespace.
>
> Â(2) NFS ID mapper.ÂÂThe NFS ID mapping cache should also probably be
> ÂÂÂÂÂper-network namespace.
>
> Â(3) nfsdcltrack.ÂÂA way for NFSD to access stable storage for tracking
> ÂÂÂÂÂof persistent state.ÂÂAgain, network-namespace dependent, but also
> ÂÂÂÂÂperhaps mount-namespace dependent.
>
> Â(4) General request-key upcalls.ÂÂNot particularly namespace dependent,
> ÂÂÂÂÂapart from keyrings being somewhat governed by the user namespace and
> ÂÂÂÂÂthe upcall being configured by the mount namespace.
>
> These patches are built on top of the mount context patchset so that
> namespaces can be properly propagated over submounts/automounts.
>
> These patches implement a container object that holds the following things:
>
> Â(1) Namespaces.
>
> Â(2) A root directory.
>
> Â(3) A set of processes, including a designated 'init' process.
>
> Â(4) The creator's credentials, including ownership.
>
> Â(5) A place to hang security for the container, allowing policies to be
> ÂÂÂÂÂset per-container.
>
> I also want to add:
>
> Â(6) Control groups.
>
> Â(7) A per-container keyring that can be added to from outside of the
> ÂÂÂÂÂcontainer, even once the container is live, for the provision of
> ÂÂÂÂÂfilesystem authentication/encryption keys in advance of the container
> ÂÂÂÂÂbeing started.

It's hard to decide which of these has higher priority, I think both essential
to a container implementation.

Ian