Re: v2 of seccomp filter c/r patches

From: Andy Lutomirski
Date: Fri Sep 11 2015 - 13:01:01 EST


On Fri, Sep 11, 2015 at 9:30 AM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> On Sep 10, 2015 5:22 PM, "Tycho Andersen" <tycho.andersen@xxxxxxxxxxxxx> wrote:
>>
>> Hi all,
>>
>> Here is v2 of the seccomp filter c/r set. The patch notes have individual
>> changes from the last series, but there are two points not noted:
>>
>> * The series still does not allow us to correctly restore state for programs
>> that will use SECCOMP_FILTER_FLAG_TSYNC in the future. Given that we want to
>> keep seccomp_filter's identity, I think something along the lines of another
>> seccomp command like SECCOMP_INHERIT_PARENT is needed (although I'm not sure
>> if this can even be done yet). In addition, we'll need a kcmp command for
>> figuring out if filters are the same, although this too needs to compare
>> seccomp_filter objects, so it's a little screwy. Any thoughts on how to do
>> this nicely are welcome.
>
> Let's add a concept of a seccompfd.
>
> For background of what I want to add: I want to be able to create a
> seccomp monitor. A seccomp monitor will be, logically, a pair of a
> struct file that represents the monitor and a seccomp_filter that is
> controlled by the monitor. Depending on flags, whoever holds the
> monitor fd could change the active filter, intercept syscalls, and
> issue syscalls on behalf of a process that is trapped in an
> intercepted syscall.
>
> Seccomp filters would nest properly.
>
> The interface would probably be (extremely pseudocoded):
>
> monitor_fd, filter_fd = seccomp(CREATE_MONITOR, flags, ...);
>
> Then, later:
>
> seccomp(ATTACH_TO_FILTER, filter_fd); /* now filtered */
>
> read(monitor_fd, buf, size); /* returns an intercepted syscall */
> write(monitor_fd, buf, size); /* issues a syscall or releases the
> trapped task */
>
> This can't be implemented on x86 without either going insane or
> finishing the massive set of pending cleanups to the x86 entry code.
> I favor the latter.
>
> We could, however, add part of it right now: we could have a way to
> create a filterfd, we could add kcmp support for it, and we could add
> the ATTACH_TO_FILTER thing. I think that would solve your problem.
>
> One major open question: does a filter_fd know what its parent is and,
> if so, will it just refuse to attach if the caller's parent is wrong?
> Or will a filter_fd attach anywhere.
>

Let me add one more thought:

Currently, struct seccomp_filter encodes a strict tree hierarchy: it
knows what its parent is. This only matters as an implementation
detail and because TSYNC checks for seccomp_filter equality.

We could change this without user-visible effects. We could say that,
for TSYNC purposes, two filter states match if they contain exactly
the same layers in the same order where a layer does *not* encode a
concept of parent. We could then say that attaching a classic bpf
filter creates a branch new layer that is not equal to any other layer
that's been created.

This has no effect whatsoever. The difference would be that we could
declare that attaching the same ebpf program twice creates the *same*
layer so that, if you fork and both children attach the same ebpf
program, then they match for TSYNC purposes. Similarly, attaching the
same hypothetical filterfd would create the same layer.

Thoughts?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/