Re: [CRIU] Introspecting userns relationships to other namespaces?

From: Andrew Vagin
Date: Tue Jul 12 2016 - 23:44:03 EST


On Sat, Jul 09, 2016 at 01:29:20PM -0500, Eric W. Biederman wrote:
> ebiederm@xxxxxxxxxxxx (Eric W. Biederman) writes:
>
> > Andrew Vagin <avagin@xxxxxxxxxxxxx> writes:
> >
> >> All these thoughts about security make me thinking that kcmp is what we
> >> should use here. It's maybe something like this:
> >>
> >> kcmp(pid1, pid2, KCMP_NS_USERNS, fd1, fd2)
> >>
> >> - to check if userns of the fd1 namepsace is equal to the fd2 userns
> >>
> >> kcmp(pid1, pid2, KCMP_NS_PARENT, fd1, fd2)
> >>
> >> - to check if a parent namespace of the fd1 pidns is equal to fd pidns.
> >>
> >> fd1 and fd2 is file descriptors to namespace files.
> >>
> >> So if we want to build a hierarchy, we need to collect all namespaces
> >> and then enumerate them to check dependencies with help of kcmp.
> >
> > That is certainly one way to go.
> >
> > There is a funny case where we would want to compare a user namespace
> > file descriptor to a parent user namespace file descriptor.
> >
> >
> > Grumble, Grumble. I think this may actually a case for creating ioctls
> > for these two cases. Now that random nsfs file descriptors are bind
> > mountable the original reason for using proc files is not as pressing.
> >
> > One ioctl for the user namespace that owns a file descriptor.
> > One ioctl for the parent namespace of a namespace file descriptor.
> >
> > We also need some way to get a command file descriptor for a file system
> > super block. Al Viro has a pet project for cleaning up the mount API
> > and this might be the idea excuse to start looking at that.
> >
> > (In principle we might be able to run commands through the namespace
> > file descriptor and using an ioctl feels dirty. But an ioctl that
> > only uses the fd and request argument does not suffer from the same
> > problems that ioctls that have to pass additional arguments suffer
> > from.)
>
> Of course it should be an error perhaps -EINVAL to get a user
> namespace owner or parent namespace that is outside of a processes
> current user namespace or pid namespace. That way thing stay bounded
> within the current namespaces the process is in. Which prevents any
> leak possibilities, and keeps CRIU working.

I prepared patches with ioctl-s to understand how it looks like.

Here is a whole series:
https://github.com/avagin/linux-task-diag/commits/namespaces

Here is a patch to get an owning user namespace:
https://github.com/avagin/linux-task-diag/commit/7fad8ff3fc4110bebf0920cec2388390b3bd2238
https://github.com/avagin/linux-task-diag/commit/2663bc803d324785e328261f3c07a0fef37d2088

Here is an example how it looks from user-space:
https://github.com/avagin/linux-task-diag/blob/namespaces/tools/testing/selftests/nsfs/owner.c#L49

I like the idea with ioctl-s. James, Michael, Trevor, what is your
opinion about this?

>
> Eric