Re: [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters

From: Aleksa Sarai
Date: Sat May 11 2019 - 11:51:13 EST

On 2019-05-11, Christian Brauner <christian@xxxxxxxxxx> wrote:
> > In my opinion, the problems here are:
> >
> > - Apparently some people run untrusted containers without user
> > namespaces. It would be really nice if people could not do that.
> > (Probably the biggest problem here.)
> I know I sound like a broken record since I've been going on about this
> forever together with a lot of other people but honestly,
> the fact that people are running untrusted workloads in privileged containers
> is the real issue here.

I completely agree. It's a shit-show, and it's caused by bad defaults in
Docker and (now) podman. To be fair, they both now support rootless
containers but the default is still privileged containers.

They do support user namespaces (though it should be noted that LXD's
support is much nicer from a security standpoint) but unless it's the
default the support is almost pointless. In the case of Docker it can
lead to some usability issues when you enable it (which I believe is the
main justification for it not being the default).

> Aleksa is a good friend of mine and we have discussed this a lot so I hope
> he doesn't hate me for saying this again: it is crazy that there are container
> runtimes out there that promise (or at least do not state the opposite)
> containers without user namespaces or containers with user namespaces
> that allow to map the host root id to anything can be safe. They cannot.

Yeah, the fact that we (runc) don't scream from the rooftops that this
setup is insecure is definitely a problem. I have mentioned this
whenever I've had a chance, but the fact that the most popular runtimes
(which use runc) don't use user namespaces compounds the issue. I'm
willing to bet that >90% of users of runc-based runtimes don't use user
namespaces at all, and this is all down to bad defaults.

There are also some other misfeatures we have in runc that we're
basically forced to support because some users use them, and we can't
really break entire projects (even though it's the projects' fault they
have an insecure setup).

> It seems to me to be heading in the wrong direction to keep up the
> illusion that with enough effort we can make this all nice and safe.
> Yes, the userspace memfd hack we came up with is as ugly as a security
> patch can be but if you make promises you can't keep you better be
> prepared to pay the price when things start to fall apart.

> So if this part of the patch is just needed to handle this do we really
> want to do all that tricky work or is there more to gain from this that
> makes it worth it.

I dropped this patch in v7, I don't think it's required for the
overarching feature. Looking back on it, it doesn't make much sense
given the context that privileged containers are unsafe in the first

I do think that being able to block introspection might be a useful
hardening feature though. During attachment it would be nice to be sure
that nothing will be able to touch the attaching process's /proc/$pid --
even itself.

Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH

Attachment: signature.asc
Description: PGP signature