Referencing vfsmounts in userspace using a file descriptor:
http://marc.theaimsgroup.com/?l=linux-kernel&m=109871948812782&w=2
Why not just use /proc/PID/fd/FD?
In what sense? readlink of /proc/PID/fd/* will provide a pathname
relative to current's root: useless for any paths not in current's
namespace.
Not readlink, but actual dereference of link will give you exactly the
vfsmount/dentry the file was opened on. If you want to bind/move
whatever on that mount, that's possible, even if it's a "detached
tree".
Also, if we were to hijack /proc/PID/fd/* for cross namespace
manipulation, then we'd be enabling any root user on the system to
modify anyone's namespace.
No. Obviously there must be limitations on which process can access
/proc/PID/fd. It's obviously safe to allow 'self' for example. But
any process you can ptrace should also be safe.
This interface also has the huge advantage that you gain all the goodies
of using file descriptors, such as SCM_RIGHTS. You can hand of entire
trees of mountpoints between applications without ever even binding them
to any namespace whatsoever.
Why is that dependent on the interface? The /proc interface already
allows _everything_ you want to do with file descriptors. Only the
the necessary restrictions should be worked out.
Tie this in with some userspace code that can mount devices for users
with restrictions and appropriate policy, you can create some API+daemon
for regular user apps to get things mounted in a way that guarantees
hiding from other users.
Yes. And I think that can be done without any new magic ioctls. Just
with mount/umount and /proc.
The implementation is another question. I'll look through your
patches to see how you achieve all this.