Re: v2 of seccomp filter c/r patches
From: Tycho Andersen
Date: Fri Sep 11 2015 - 13:28:15 EST
On Fri, Sep 11, 2015 at 10:00:22AM -0700, Andy Lutomirski wrote:
> On Fri, Sep 11, 2015 at 9:30 AM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> > On Sep 10, 2015 5:22 PM, "Tycho Andersen" <tycho.andersen@xxxxxxxxxxxxx> wrote:
> >>
> >> Hi all,
> >>
> >> Here is v2 of the seccomp filter c/r set. The patch notes have individual
> >> changes from the last series, but there are two points not noted:
> >>
> >> * The series still does not allow us to correctly restore state for programs
> >> that will use SECCOMP_FILTER_FLAG_TSYNC in the future. Given that we want to
> >> keep seccomp_filter's identity, I think something along the lines of another
> >> seccomp command like SECCOMP_INHERIT_PARENT is needed (although I'm not sure
> >> if this can even be done yet). In addition, we'll need a kcmp command for
> >> figuring out if filters are the same, although this too needs to compare
> >> seccomp_filter objects, so it's a little screwy. Any thoughts on how to do
> >> this nicely are welcome.
> >
> > Let's add a concept of a seccompfd.
> >
> > For background of what I want to add: I want to be able to create a
> > seccomp monitor. A seccomp monitor will be, logically, a pair of a
> > struct file that represents the monitor and a seccomp_filter that is
> > controlled by the monitor. Depending on flags, whoever holds the
> > monitor fd could change the active filter, intercept syscalls, and
> > issue syscalls on behalf of a process that is trapped in an
> > intercepted syscall.
> >
> > Seccomp filters would nest properly.
> >
> > The interface would probably be (extremely pseudocoded):
> >
> > monitor_fd, filter_fd = seccomp(CREATE_MONITOR, flags, ...);
> >
> > Then, later:
> >
> > seccomp(ATTACH_TO_FILTER, filter_fd); /* now filtered */
> >
> > read(monitor_fd, buf, size); /* returns an intercepted syscall */
> > write(monitor_fd, buf, size); /* issues a syscall or releases the
> > trapped task */
> >
> > This can't be implemented on x86 without either going insane or
> > finishing the massive set of pending cleanups to the x86 entry code.
> > I favor the latter.
> >
> > We could, however, add part of it right now: we could have a way to
> > create a filterfd, we could add kcmp support for it, and we could add
> > the ATTACH_TO_FILTER thing. I think that would solve your problem.
> >
> > One major open question: does a filter_fd know what its parent is and,
> > if so, will it just refuse to attach if the caller's parent is wrong?
> > Or will a filter_fd attach anywhere.
> >
>
> Let me add one more thought:
>
> Currently, struct seccomp_filter encodes a strict tree hierarchy: it
> knows what its parent is. This only matters as an implementation
> detail and because TSYNC checks for seccomp_filter equality.
>
> We could change this without user-visible effects. We could say that,
> for TSYNC purposes, two filter states match if they contain exactly
> the same layers in the same order where a layer does *not* encode a
> concept of parent. We could then say that attaching a classic bpf
> filter creates a branch new layer that is not equal to any other layer
> that's been created.
>
> This has no effect whatsoever. The difference would be that we could
> declare that attaching the same ebpf program twice creates the *same*
> layer so that, if you fork and both children attach the same ebpf
> program, then they match for TSYNC purposes.
Would you keep struct seccomp_filter identity here (meaning that you'd
reach over and grab the seccomp_filter from a sibling thread if it
existed)? Would it only work for the last filter attached to siblings,
or for all the filters? This does make my life easier, but I like the
idea of just using seccompfd directly below as it seems somewhat
easier (for me at least) to understand,
> Similarly, attaching the
> same hypothetical filterfd would create the same layer.
If we change the api of my current set to have the ptrace commands
iterate over seccomp fds, it looks something like:
seccompfd = ptrace(GET_FILTER_FD, pid);
while (ptrace(NEXT_FD, pid, seccompfd) == 0) {
if (seccomp(CHECK_INHERITED, seccompfd))
break;
bpffd = seccomp(GET_BPF_FD, seccompfd);
err = buf(BPF_PROG_DUMP, bpffd, &attr);
/* save the bpf prog */
}
then restore can look like:
while (have_noninherited_filters()) {
filter = load_filter();
bpffd = bpf(BPF_PROG_LOAD, filter);
seccompfd = seccomp(SECCOMP_FD_CREATE, bpffd);
filters[n_filters++] = seccompfd;
}
/* fork any children as necessary and do the rest of the restore */
for (i = 0; i < n_filters; i++) {
seccomp(SECCOMP_FD_INSTALL, filters[i]);
}
then the only question is how to implement the CHECK_INHERITED command
on dump.
If we support the above API, we don't need to think about the concept
of layers at all, or do any extra work on filter install to preserve
struct seccomp_filter identity, it just comes naturally.
Tycho
> Thoughts?
>
> --Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/