Re: [PATCH v7 1/6] seccomp: add a return code to trap to userspace

From: Tycho Andersen
Date: Thu Sep 27 2018 - 19:08:14 EST


On Thu, Sep 27, 2018 at 03:45:11PM -0700, Kees Cook wrote:
> On Thu, Sep 27, 2018 at 2:51 PM, Jann Horn <jannh@xxxxxxxxxx> wrote:
> > On Thu, Sep 27, 2018 at 5:11 PM Tycho Andersen <tycho@xxxxxxxx> wrote:
> >> However, care should be taken to avoid the TOCTOU
> >> +mentioned above in this document: all arguments being read from the tracee's
> >> +memory should be read into the tracer's memory before any policy decisions are
> >> +made. This allows for an atomic decision on syscall arguments.
> >
> > Again, I don't really see how you could get this wrong.
>
> Doesn't hurt to mention it, IMO.
>
> >> +static long seccomp_notify_send(struct seccomp_filter *filter,
> >> + unsigned long arg)
> >> +{
> >> + struct seccomp_notif_resp resp = {};
> >> + struct seccomp_knotif *knotif = NULL;
> >> + long ret;
> >> + u16 size;
> >> + void __user *buf = (void __user *)arg;
> >> +
> >> + if (copy_from_user(&size, buf, sizeof(size)))
> >> + return -EFAULT;
> >> + size = min_t(size_t, size, sizeof(resp));
> >> + if (copy_from_user(&resp, buf, size))
> >> + return -EFAULT;
> >> +
> >> + ret = mutex_lock_interruptible(&filter->notify_lock);
> >> + if (ret < 0)
> >> + return ret;
> >> +
> >> + list_for_each_entry(knotif, &filter->notif->notifications, list) {
> >> + if (knotif->id == resp.id)
> >> + break;
> >> + }
> >> +
> >> + if (!knotif || knotif->id != resp.id) {
> >
> > Uuuh, this looks unsafe and wrong. I don't think `knotif` can ever be
> > NULL here. If `filter->notif->notifications` is empty, I think
> > `knotif` will be `container_of(&filter->notif->notifications, struct
> > seccom_knotif, list)` - in other words, you'll have a type confusion,
> > and `knotif` probably points into some random memory in front of
> > `filter->notif`.
> >
> > Am I missing something?
>
> Oh, good catch. This just needs to be fixed like it's done in
> seccomp_notif_recv (separate cur and knotif).
>
> >> +static struct file *init_listener(struct task_struct *task,
> >> + struct seccomp_filter *filter)
> >> +{
> >
> > Why does this function take a `task` pointer instead of always
> > accessing `current`? If `task` actually wasn't `current`, I would have
> > concurrency concerns. A comment in seccomp.h even explains:
> >
> > * @filter must only be accessed from the context of current as there
> > * is no read locking.
> >
> > Unless there's a good reason for it, I would prefer it if this
> > function didn't take a `task` pointer.
>
> This is to support PTRACE_SECCOMP_NEW_LISTENER.
>
> But you make an excellent point. Even TSYNC expects to operate only on
> the current thread group. Hmm.
>
> While the process is stopped by ptrace, we could, in theory, update
> task->seccomp.filter via something like TSYNC.
>
> So perhaps use:
>
> mutex_lock_killable(&task->signal->cred_guard_mutex);
>
> before walking the notify_locks?

This means that all the seccomp/ptrace code probably needs to be
updated for this? I'll try to send patches for this as well as the
return code thing Jann pointed out.

> >
> >> + struct file *ret = ERR_PTR(-EBUSY);
> >> + struct seccomp_filter *cur, *last_locked = NULL;
> >> + int filter_nesting = 0;
> >> +
> >> + for (cur = task->seccomp.filter; cur; cur = cur->prev) {
> >> + mutex_lock_nested(&cur->notify_lock, filter_nesting);
> >> + filter_nesting++;
> >> + last_locked = cur;
> >> + if (cur->notif)
> >> + goto out;
> >> + }
> >> +
> >> + ret = ERR_PTR(-ENOMEM);
> >> + filter->notif = kzalloc(sizeof(*(filter->notif)), GFP_KERNEL);
> >
> > sizeof(struct notification) instead, to make the code clearer?
>
> I prefer what Tycho has: I want to allocate an instances of whatever
> filter->notif is.
>
> Though, let's do the kzalloc outside of the locking, instead?

Yep, sounds good.

> >> + ret = anon_inode_getfile("seccomp notify", &seccomp_notify_ops,
> >> + filter, O_RDWR);
> >> + if (IS_ERR(ret))
> >> + goto out;
> >> +
> >> +
> >> + /* The file has a reference to it now */
> >> + __get_seccomp_filter(filter);
> >
> > __get_seccomp_filter() has a comment in it that claims "/* Reference
> > count is bounded by the number of total processes. */". I think this
> > change invalidates that comment. I think it should be fine to just
> > remove the comment.
>
> Update it to "bounded by total processes and notification listeners"?

Will do.

> >> +out:
> >> + for (cur = task->seccomp.filter; cur; cur = cur->prev) {
> >
> > s/; cur;/; 1;/, or use a while loop instead? If the NULL check fires
> > here, something went very wrong.
>
> Hm? This is correct. This is how seccomp_run_filters() walks the list too:
>
> struct seccomp_filter *f =
> READ_ONCE(current->seccomp.filter);
> ...
> for (; f; f = f->prev) {
>
> Especially if we'll be holding the cred_guard_mutex.

There is a last_locked local here though, I think that's what Jann is
pointing out.

Cheers,

Tycho