Sending caps to userspace (Re: [GIT PULL] kdbus for 4.1-rc1)

From: Andy Lutomirski
Date: Wed Apr 15 2015 - 11:42:34 EST

[resending because gmail.]

tl;dr AFAICS systemd is the only thing using caps like this.
systemd's code for that appears to be exploitably buggy, but that bug
is mitigated by the fact that I haven't found any evidence that the
cap check does anything, so there's nothing to exploit. But it sure
seems like a single example of this mechanism being used with no real
purpose is not a good justificaiton for trying to support it for real.

On Apr 15, 2015 5:00 AM, "Greg Kroah-Hartman"
<gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> [Back to the capability discussion]
> On Tue, Apr 14, 2015 at 11:57:22AM -0700, Andy Lutomirski wrote:
> > >> Then I'll have to find a way to embolden my NACK further. My point is
> > >> that capturing garbage like cmdline and capabilities (again, that
> > >> latter part is completely unacceptable under any circumstances
> > >> whatsoever) on behalf of *all* senders is a disaster. If it's
> > >> optional, then I can at least hope that userspace will honor the
> > >> optionality and let everything turn it off. If it's mandatory, then
> > >> kdbus is just unsafe to use to send messages to untrusted parties.
> > >
> > > It's opted in by the receiving peer if the task implementing a service
> > > wants to access these pieces of information. It is optional, and the
> > > documentation clearly states that userspace should cope with this, and
> > > also, when they are available we make sure to provide the correct
> > > race-free information.
> > >
> > > As said many times before, an application can do so already today with
> > > information from other API file systems, so why is this suddenly a
> > > problem when kdbus optionally offers the exact same information along
> > > with each transmitted message? Yes, we all "hate" capabilities, but
> > > userspace uses them, and gets access to them all the time through the
> > > POSIX apis (capget(), cap_get_pid(), capgetp(), etc.) and through
> > > /proc/pid/status. They are something that we have to support and handle
> > > properly.
> > >
> > > In the very first submission of kdbus, we stated that we want to allow
> > > userspace methods to access these same bits to be able to make decisions
> > > about permissions. And to do so in a race-free manner, which is very
> > > hard, if not almost impossible, to do so from userspace alone.
> > >
> > > For instance, if a task has CAP_NET_ADMIN set, we can use that
> > > information in order to allow or disallow certain actions to be taken by
> > > a privileged process. Or, if a client that has the capability to call
> > > reboot (i.e. have CAP_SYS_REBOOT) makes the D-Bus call to reboot the
> > > system, the system daemon listening for that message knows that yes, at
> > > the time that the client made that call, it really did have that
> > > capability so it is ok to actually reboot the system.
> > >
> > > Instead of trying to use SCM_CREDENTIALS to get the pid and another
> > > round of cap_get_pid() and the like, all of which are susceptable to
> > > racing and all sorts of other horrors, that are insecure, we can provide
> > > this information in an atomic, and secure way.
> >
> > /me suppresses a long string of expletives.
> >
> > Please point me at the code that does this with caps. It's WRONG in
> > userspace and it's WRONG in the kernel. I want to know what code that
> > runs on my system does this so I can send the appropriate bug reports
> > and get it fixed. I think the RHEL crowd at least will take it
> > seriously when I tell them that this is a security hole.
> Look at how polkit and login manager work. Or anything that uses
> SCM_CREDENTIALS. Also I think PAM does odd things with credentials, but
> it's been a long time since I looked at any PAM code, I could be wrong.
> Also look at users of SO_PEERCRED, as those are used in places as well,
> but you know all about those.

Huh? The only thing the unix socket creds crap has to do with caps is
that it gives you a pid. AFAICT, and I could be wrong, systemd is the
only code that came up with the dubious idea of mapping
AF_UNIX-derived pids to caps.

> Also look at programs that make those capability calls, they are
> obviously using them for some reason, right? Nothing we can do about
> them, and it's not the main issue here at all, sorry for the
> side-discussion.

I looked. AFAICT polkit doesn't use caps. Systemd does (look for
VTABLE_CAP and the associated code) for reasons that escape me. I
have yet to find a single cap-guarded method in the systemd code base
that isn't also guarded by dbus policy and/or polkit.

That latter part is important for two reasons. Reason 1: the cap code
doesn't seem to do anything, which makes it hard for me to break.
(It's also spaghetti strung across lots of files, so I could be
wrong.). Reason 2: if the cap code isn't actually serving a security
purpose, correctly or otherwise, why is it there? Can't it just be

> > > The kernel today, and userspace, relies on capabilities all the time
> > > (i.e. almost every syscall), how are they something that is somehow not
> > > valid to use and support?
> >
> > No. The *kernel* relies on caps. Userspace should not.
> Userspace uses caps to have the kernel do things. Or not do things. If
> not, why do we have things like SCM_CREDINTIALS in the first place?

For uid, which is much more sensible for this purpose than caps.

(And note that even uid is b0rked a bit here and that we're forced to
use ruid and not euid to avoid a nasty exploitable design error. At
least kdbus avoids that particular issue.)

> > > And of course, as Eric will point out, capabailities are not
> > > translatable across user namespaces, which is a problem. Because of
> > > this, we dispose of that piece of metadata information when a message
> > > crosses a user namespace boundry. This is the right thing to do, which
> > > is not the case for almost all other kernel apis which report bogus
> > > capabilies when user namespaces are crossed.
> >
> > The right thing to do is to not use capabilities for userspace stuff.
> Again, userspace needs them in order to have the kernel do things for
> userspace as needed. Look at the Tizen example in the first email,
> where they had to use SCM_CREDENTIALS, and all of the speed/latency
> issues that this resulted in.

I haven't the faintest clue what Tizen wants to do that doesn't map
cleanly onto uid. Android has a very rich, if
awful-for-business-reasons, security model, and it does it with just
uid and gid.

> > > So we implemented this correctly, and somehow that is a feature so bad
> > > that both you and Eric think the whole baby should be thrown out? How
> > > else should this be implemented?
> >
> > It shouldn't be implemented.
> Great, so can we also drop those POSIX functions and the /proc/
> information as well? I didn't think so :)

Well, the /proc information is exploitably useless, but systemd still
has hundreds of lines of code to use it.

I would *love* to rip out and replace SCM_CREDENTIALS, but we have a
little ABI compatibility issue there.

> > > As documented in the original email on this thread, Tizen wants to use
> > > this, as it solves a real need that they have. Their workarounds
> > > involve using custom UDS sockets, but the latency involved is horrid and
> > > unacceptable. Using a kdbus message solves this issue for them,
> > > allowing UI rendering to work properly/quickly.
> > >
> > > Again, capabilities are something we all require and rely on today,
> > > passing the current capability on to a recipient isn't a way to raise
> > > privileges at all, but rather, properly determine if they are present
> > > at sending time, if wanted. How does that create an insecure system?
> > > What am I missing that is so bad here with the design we have?
> >
> > That, even if the implementation could be made to be useful and
> > correct, capabilities refer to privileges wrt the kernel, not
> > userspace. They're not the right bit of policy to look at here.
> So what is the right bit of policy to look at then?

Uid or, in certain cases, maybe cgroup. In some very well designed
systems, possession of an fd is much better than any of the above.

> > For example, the thing that should make it possible to run 'systemctl
> > reboot' or whatever is not CAP_SYS_BOOT, because CAP_SYS_BOOT is the
> > permission to hard reboot the system immediately, and that's not what
> > 'systemctl reboot' is for.
> 'systemctl reboot' calls a bunch of other things to determine if you
> have local access to the machine, or permissions to reboot the machine
> (i.e. CAP_SYS_BOOT), and other things that polkit might allow you to do,
> and then, it decides to reboot or not. That happens today, right? I
> don't understand the argument here.

Those other things are probably okay. I actually tried setcap
cap_sys_boot=eip dbus-send, and I still couldn't make systemctl reboot
work in an ssh session. In my book, that's a good thing.

So what's the point of the cap check again?

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at