Re: v2 of seccomp filter c/r patches

From: Andy Lutomirski
Date: Mon Sep 14 2015 - 13:53:14 EST


On Sep 11, 2015 10:28 AM, "Tycho Andersen" <tycho.andersen@xxxxxxxxxxxxx> wrote:
>
> On Fri, Sep 11, 2015 at 10:00:22AM -0700, Andy Lutomirski wrote:
> > On Fri, Sep 11, 2015 at 9:30 AM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> > > On Sep 10, 2015 5:22 PM, "Tycho Andersen" <tycho.andersen@xxxxxxxxxxxxx> wrote:
> > >>
> > >> Hi all,
> > >>
> > >> Here is v2 of the seccomp filter c/r set. The patch notes have individual
> > >> changes from the last series, but there are two points not noted:
> > >>
> > >> * The series still does not allow us to correctly restore state for programs
> > >> that will use SECCOMP_FILTER_FLAG_TSYNC in the future. Given that we want to
> > >> keep seccomp_filter's identity, I think something along the lines of another
> > >> seccomp command like SECCOMP_INHERIT_PARENT is needed (although I'm not sure
> > >> if this can even be done yet). In addition, we'll need a kcmp command for
> > >> figuring out if filters are the same, although this too needs to compare
> > >> seccomp_filter objects, so it's a little screwy. Any thoughts on how to do
> > >> this nicely are welcome.
> > >
> > > Let's add a concept of a seccompfd.
> > >
> > > For background of what I want to add: I want to be able to create a
> > > seccomp monitor. A seccomp monitor will be, logically, a pair of a
> > > struct file that represents the monitor and a seccomp_filter that is
> > > controlled by the monitor. Depending on flags, whoever holds the
> > > monitor fd could change the active filter, intercept syscalls, and
> > > issue syscalls on behalf of a process that is trapped in an
> > > intercepted syscall.
> > >
> > > Seccomp filters would nest properly.
> > >
> > > The interface would probably be (extremely pseudocoded):
> > >
> > > monitor_fd, filter_fd = seccomp(CREATE_MONITOR, flags, ...);
> > >
> > > Then, later:
> > >
> > > seccomp(ATTACH_TO_FILTER, filter_fd); /* now filtered */
> > >
> > > read(monitor_fd, buf, size); /* returns an intercepted syscall */
> > > write(monitor_fd, buf, size); /* issues a syscall or releases the
> > > trapped task */
> > >
> > > This can't be implemented on x86 without either going insane or
> > > finishing the massive set of pending cleanups to the x86 entry code.
> > > I favor the latter.
> > >
> > > We could, however, add part of it right now: we could have a way to
> > > create a filterfd, we could add kcmp support for it, and we could add
> > > the ATTACH_TO_FILTER thing. I think that would solve your problem.
> > >
> > > One major open question: does a filter_fd know what its parent is and,
> > > if so, will it just refuse to attach if the caller's parent is wrong?
> > > Or will a filter_fd attach anywhere.
> > >
> >
> > Let me add one more thought:
> >
> > Currently, struct seccomp_filter encodes a strict tree hierarchy: it
> > knows what its parent is. This only matters as an implementation
> > detail and because TSYNC checks for seccomp_filter equality.
> >
> > We could change this without user-visible effects. We could say that,
> > for TSYNC purposes, two filter states match if they contain exactly
> > the same layers in the same order where a layer does *not* encode a
> > concept of parent. We could then say that attaching a classic bpf
> > filter creates a branch new layer that is not equal to any other layer
> > that's been created.
> >
> > This has no effect whatsoever. The difference would be that we could
> > declare that attaching the same ebpf program twice creates the *same*
> > layer so that, if you fork and both children attach the same ebpf
> > program, then they match for TSYNC purposes.
>
> Would you keep struct seccomp_filter identity here (meaning that you'd
> reach over and grab the seccomp_filter from a sibling thread if it
> existed)? Would it only work for the last filter attached to siblings,
> or for all the filters? This does make my life easier, but I like the
> idea of just using seccompfd directly below as it seems somewhat
> easier (for me at least) to understand,
>

If we did that, it would just be an internal optimization.

> > Similarly, attaching the
> > same hypothetical filterfd would create the same layer.
>
> If we change the api of my current set to have the ptrace commands
> iterate over seccomp fds, it looks something like:
>
> seccompfd = ptrace(GET_FILTER_FD, pid);
> while (ptrace(NEXT_FD, pid, seccompfd) == 0) {
> if (seccomp(CHECK_INHERITED, seccompfd))
> break;
>
> bpffd = seccomp(GET_BPF_FD, seccompfd);
> err = buf(BPF_PROG_DUMP, bpffd, &attr);
> /* save the bpf prog */
> }
>
> then restore can look like:
>
> while (have_noninherited_filters()) {
> filter = load_filter();
> bpffd = bpf(BPF_PROG_LOAD, filter);
> seccompfd = seccomp(SECCOMP_FD_CREATE, bpffd);
>
> filters[n_filters++] = seccompfd;
> }
>
> /* fork any children as necessary and do the rest of the restore */
>
> for (i = 0; i < n_filters; i++) {
> seccomp(SECCOMP_FD_INSTALL, filters[i]);
> }
>
> then the only question is how to implement the CHECK_INHERITED command
> on dump.

I don't think it would be a well defined operation. I think you'd
have to ask "for this pid, give me the nth thing in the stack", since
an fd identifying a layer without reference to its parent would no
longer even be guaranteed to be unique in the filter stack for a given
task.

I'm not sure I entirely like this solution...

>
> If we support the above API, we don't need to think about the concept
> of layers at all, or do any extra work on filter install to preserve
> struct seccomp_filter identity, it just comes naturally.
>
> Tycho
>
> > Thoughts?
> >
> > --Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/