Re: [kernel-hardening] 2015 kernel CVEs
From: Jann Horn
Date: Wed Jan 20 2016 - 15:11:33 EST
On Tue, Jan 19, 2016 at 04:47:32PM -0600, Eric W. Biederman wrote:
> Dan Carpenter <dan.carpenter@xxxxxxxxxx> writes:
>
> > I like to look back over old CVEs to see how we could do better. Here
> > is the list from 2015. I got most of this information from the Ubuntu
> > CVE tracker. Thanks Ubuntu!. If it doesn't have a hash that means it
> > might not be fixed yet.
> >
> > CVE-2015-8709 : ptrace: race in user namespaces let's users trace root processes
>
> As this isn't a kernel bug,
I agree that it's not a kernel bug and not a kernel race - userspace
developers assumed security guarantees that the kernel didn't actually
provide.
However, I think that the kernel is missing documentation here and that
namespaces are designed somewhat unfortunately. A container that can be
created and securely, robustly entered by an unprivileged user would
have to work like this under the current rules as far as I can tell:
To create the container:
setsid
[prevent tty pushback via /dev/tty]
set up tty IO forwarding if necessary
[prevents tty pushback, possibly additional filtering]
unshare(CLONE_NEWUSER) to create a "purgatory" user ns.
Map the container owner to uid 0, map all uids that should be mapped into
the container (including the container root) to 1 and higher (where
1 is the container root).
stash FD to the purgatory user namespace somewhere in the outer ns
drop all privileges (open fds, ...)
setresuid(1,1,1) [still protected against ptrace by nondumpability]
unshare(CLONE_NEWUSER) to create the container's user ns
[From here on, we can be ptraced by the ns root user from outside.
The ns root user could ptrace us from outside at this point and
see the outer namespaces through us, but that's okay, he'd have
to already be in the outer user ns for that.]
set up other namespaces for the container
stash FDs to the container namespaces in the purgatory ns
let a process in the purgatory map the container uids and gids
do security-revelant setup work (setup bind mounts, ...)
[be careful here, don't trust any files in container-controlled
filesystem parts]
do security-irrelevant setup work
execlp("init")
Then, to enter the container:
setsid
[prevent tty pushback via /dev/tty]
set up tty IO forwarding if necessary
[prevents tty pushback, possibly additional filtering]
Enter the purgatory user ns, referenced through an FD
setresuid(1, 1, 1) [still protected against ptrace by nondumpability]
enter container namespaces, but not the user namespace yet
[We don't really trust the namespace FDs supplied by the setup
process because they were sent after the ns root user gained
ptrace access, but that's okay because we can only move downward
using setns(), so we end up in namespaces below the purgatory
that are owned by the namespace root. That's good enough.]
drop privileges (open fds, ...)
enter container user namespace [ns root gains ptrace access]
The purgatory user ns is necessary because without privileges in the
container's parent user namespace, it's not possible to switch to the
container root uid prior to entering it (except with an ugly hack
involving a temporary namespace, newuidmap and a (possibly temporary)
setuid binary), and more importantly, even given access to the
container's root uid, it's not possible to actually enter the
container without having the container owner's euid unless you have
CAP_SYS_ADMIN in the outer namespace.
(Of course, this could be simplified with a setuid root helper, but I
don't think anyone wants more of those to be necessary.)
> and is not a race, and no one has even
> bothered to see if any userspace processes are this stupid I don't even
> think that qualifies as a CVE.
I know of at least two projects that enter user namespaces without the
necessary care, one of them is LXC.
> There is room for improvement in this area but I don't see how this
> qualifies as a CVE.
I think I agree with that.
Attachment:
signature.asc
Description: Digital signature