Re: [PATCH v4 4/4] seccomp: add support for passing fds via USER_NOTIF

From: Andy Lutomirski
Date: Fri Jun 22 2018 - 14:21:19 EST




> On Jun 22, 2018, at 9:23 AM, Jann Horn <jannh@xxxxxxxxxx> wrote:
>
>> On Fri, Jun 22, 2018 at 12:05 AM Tycho Andersen <tycho@xxxxxxxx> wrote:
>>
>> The idea here is that the userspace handler should be able to pass an fd
>> back to the trapped task, for example so it can be returned from socket().
>>
>> I've proposed one API here, but I'm open to other options. In particular,
>> this only lets you return an fd from a syscall, which may not be enough in
>> all cases. For example, if an fd is written to an output parameter instead
>> of returned, the current API can't handle this. Another case is that
>> netlink takes as input fds sometimes (IFLA_NET_NS_FD, e.g.). If netlink
>> ever decides to install an fd and output it, we wouldn't be able to handle
>> this either.
>>
>> Still, the vast majority of interesting cases are covered by this API, so
>> perhaps it is Enough.
>>
>> I've left it as a separate commit for two reasons:
>> * It illustrates the way in which we would grow struct seccomp_notif and
>> struct seccomp_notif_resp without using netlink
>> * It shows just how little code is needed to accomplish this :)
>>
> [...]
>> @@ -1669,10 +1706,20 @@ static ssize_t seccomp_notify_write(struct file *file, const char __user *buf,
>> goto out;
>> }
>>
>> + if (resp.return_fd) {
>> + knotif->flags = resp.fd_flags;
>> + knotif->file = fget(resp.fd);
>> + if (!knotif->file) {
>> + ret = -EBADF;
>> + goto out;
>> + }
>> + }
>> +
>
> I think this is a security bug. Imagine the following scenario:
>
> - attacker creates processes A and B
> - process A installs a seccomp filter and sends the notification fd
> to process B
> - process A starts a syscall for which the filter returns
> SECCOMP_RET_USER_NOTIF
> - process B reads the notification from the notification fd
> - process B uses dup2() to copy the notification fd to file
> descriptor 1 (stdout)
> - process B executes a setuid root binary
> - the setuid root binary opens some privileged file descriptor
> (something like open("/etc/shadow", O_RDWR))
> - the setuid root binary tries to write some attacker-controlled data to stdout
> - seccomp_notify_write() interprets the start of the written data as
> a struct seccomp_notif_resp
> - seccomp_notify_write() grabs the privileged file descriptor and
> installs a copy in process A
> - process A now has access to the privileged file (e.g. /etc/shadow)
>
> It isn't clear whether it would actually be exploitable - you'd need a
> setuid binary that performs the right actions - but it's still bad.

Jann is right. ->read and ->write must not reference any of the calling taskâs state except the literal memory passed in.

>
> Unless I'm missing something, can you please turn the ->read and
> ->write handlers into an ->unlocked_ioctl handler? Something like
> this:
>
> struct seccomp_user_notif_args {
> u64 buf;
> u64 size;
> };
>
> static long unlocked_ioctl(struct file *file, unsigned int cmd,
> unsigned long arg)
> {
> struct seccomp_user_notif_args args;
> struct seccomp_user_notif_args __user *uargs;
>
> if (cmd != SECCOMP_USER_NOTIF_READ && cmd != SECCOMP_USER_NOTIF_WRITE)
> return -EINVAL;
>
> if (copy_from_user(&args, uargs, sizeof(args)))
> return -EFAULT;
>
> switch (cmd) {
> case SECCOMP_USER_NOTIF_READ:
> return seccomp_notify_read(file, (char __user
> *)args.buf, (size_t)args.size);
> case SECCOMP_USER_NOTIF_WRITE:
> return seccomp_notify_write(file, (char __user
> *)args.buf, (size_t)args.size);
> default:
> return -EINVAL;
> }
> }