Re: [PATCH v3 0/7] User namespace mount updates

From: Austin S Hemmelgarn
Date: Wed Nov 18 2015 - 10:35:29 EST

On 2015-11-18 09:58, Al Viro wrote:
On Wed, Nov 18, 2015 at 08:22:38AM -0600, Seth Forshee wrote:

But it still requires the admin set it up that way, no? And aren't
privileges required to set up those devices in the first place?

I'm not saying that it wouldn't be a good idea to lock down the backing
stores for those types of devices too, just that it isn't something that
a regular user could exploit without an admin doing something to
facilitate it.

Sigh... If it boils down to "all admins within all containers must be
trusted not to try and break out" (along with "roothole in any container
escalates to kernel-mode code execution on host"), then what the fuck
is the *point* of bothering with containers, userns, etc. in the first
place? If your model is basically "you want isolation, just use kvm",
fine, but where's the place for userns in all that?
In this case, Seth is referring to the host admin, not the container admin.

And if you are talking about the _host_ admin, then WTF not have him just
mount what's needed as part of setup and to hell with mounting those
inside the container?
This is decidedly non-trivial to handle in some cases. IIRC, one of the particular things that sparked this in the first place was the Chrome Native Client having to have CAP_SYS_ADMIN or SUID set on it to handle setting up it's own sandbox, which is not something that should ever be set on an executable that runs untrusted code (which is the whole point of NaCl).

Look at that from the hosting company POV - they are offering a bunch of
virtual machines on one physical system. And you want the admins on those
virtual machines independent from the host admin. Fine, but then you
really need to keep them unable to screw each other or gain kernel-mode
execution on the host.

Again, what's the point of all that? I assumed the model where containers
do, you know, contain what's in them, regardless of trust. You guys seem
to assume something different and I really wonder what it _is_...
Yes, hosting and isolation of untrusted code are valid uses for containers, which is why I suggested the ability to disallow mounts other than FUSE, and make that the default state. There are other perfectly valid uses for them as well, and for me the two I'm particularly interested in are safe deployment of a new system from an existing system (ala Gentoo or Arch installation, or manual installation of *BSD), and running non-native distros without virtualization (On a single user system, virtualization is overkill when all you want is a Debian or Fedora or Arch testing environment and don't care about their specific kernel features).

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature