Re: [PATCH 2/9] Implement containers as kernel objects

From: Serge E. Hallyn
Date: Wed Sep 06 2017 - 10:03:36 EST


Quoting Richard Guy Briggs (rgb@xxxxxxxxxx):
...
> > I believe we are going to need a container ID to container definition
> > (namespace, etc.) mapping mechanism regardless of if the container ID
> > is provided by userspace or a kernel generated serial number. This
> > mapping should be recorded in the audit log when the container ID is
> > created/defined.
>
> Agreed.
>
> > > As was suggested in one of the previous threads, if there are any events not
> > > associated with a task (incoming network packets) we log the namespace ID and
> > > then only concern ourselves with its container serial number or container name
> > > once it becomes associated with a task at which point that tracking will be
> > > more important anyways.
> >
> > Agreed. After all, a single namespace can be shared between multiple
> > containers. For those security officers who need to track individual
> > events like this they will have the container ID mapping information
> > in the logs as well so they should be able to trace the unassociated
> > event to a set of containers.
> >
> > > I'm not convinced that a userspace or kernel generated UUID is that useful
> > > since they are large, not human readable and may not be globally unique given
> > > the "pets vs cattle" direction we are going with potentially identical
> > > conditions in hosts or containers spawning containers, but I see no need to
> > > restrict them.
> >
> > From a kernel perspective I think an int should suffice; after all,
> > you can't have more containers then you have processes. If the
> > container engine requires something more complex, it can use the int
> > as input to its own mapping function.
>
> PIDs roll over. That already causes some ambiguity in reporting. If a
> system is constantly spawning and reaping containers, especially
> single-process containers, I don't want to have to worry about that ID
> rolling to keep track of it even though there should be audit records of
> the spawn and death of each container. There isn't significant cost
> added here compared with some of the other overhead we're dealing with.

Strawman proposal:

1. Each clone/unshare/setns involving a namespace type generates an audit
message along the lines of:

PID 9512 (pid in init_pid_ns) in auditnsid 00000001 cloned CLONE_NEWNS|CLONE_NEWNET
new auditnsid: 00000002
associated namespaces: (list of all namespace filesystem inode numbers)

2. Userspace (i.e. the container logging deamon here) can watch the audit log
for all messages relating to auditnsid 00000002. Presumably there will be
messages along the lines of "PID 9513 in auditnsid 00000002 cloned...". The
container logging daemon can track those messages and add the new auditnsids
to the list it watches.

3. If a container is migrated (checkpointed and restored here or elsewhere),
userspace can just follow the appropriate logs for the new containers.

Userspace does not ever *request* a auditnsid. They are ephemeral, just a
tool to track the namespaces through the audit log. They are however guaranteed
to never be re-used until reboot.

(Feels like someone must have proposed this before)

-serge