Re: [PATCH v3 2/4] seccomp: Add SECCOMP_FILTER_FLAG_KILL_PROCESS

From: Kees Cook
Date: Fri Aug 11 2017 - 14:33:00 EST

Next message: Colin King: "[PATCH][next] Bluetooth: kfree tmp rather than an alias to it"
Previous message: Matthias Kaehlcke: "Re: [PATCH v2] drm/i915: Return correct EDP voltage swing table for 0.85V"
In reply to: Tyler Hicks: "Re: [PATCH v3 2/4] seccomp: Add SECCOMP_FILTER_FLAG_KILL_PROCESS"
Next in thread: Kees Cook: "[PATCH v3 4/4] selftests/seccomp: Test thread vs process killing"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Aug 11, 2017 at 9:58 AM, Tyler Hicks <tyhicks@xxxxxxxxxxxxx> wrote:
>> @@ -201,8 +203,25 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
>> */
>> for (; f; f = f->prev) {
>> u32 cur_ret = BPF_PROG_RUN(f->prog, sd);
>> + u32 action = cur_ret & SECCOMP_RET_ACTION;
>>
>> - if ((cur_ret & SECCOMP_RET_ACTION) < (ret & SECCOMP_RET_ACTION)) {
>> + /*
>> + * In order to distinguish between SECCOMP_RET_KILL and
>> + * "higher priority" synthetic SECCOMP_RET_KILL_PROCESS
>> + * identified by the kill_process filter flag, treat any
>> + * case as immediately stopping filter processing. No
>> + * higher priority action can exist, and we can't stop
>> + * on the first RET_KILL (which may not have set
>> + * f->kill_process) when a RET_KILL further up the filter
>> + * list may have f->kill_process set which would go
>> + * unnoticed.
>> + */
>> + if (unlikely(action == SECCOMP_RET_KILL && f->kill_process)) {
>> + *match = f;
>> + return cur_ret;
>> + }
>
> Why not let the application enforce this via the seccomp filter? In
> other words, the first filter loaded with
> SECCOMP_FILTER_FLAG_KILL_PROCESS set could have a rule in the filter
> that only allows seccomp(2) to be called in the future with the
> SECCOMP_FILTER_FLAG_KILL_PROCESS flag set.

I've been using the guide of "if SECCOMP_RET_KILL_PROCESS _did_ exist,
how would its semantics differ?"

In that magic world, it wouldn't be possible to create a seccomp
filter to screen out SECCOMP_RET_KILL_PROCESS. Also, being able to
distinguish between the two states (see below).

> I understand the reasoning for wanting to enforce this automatically at
> the kernel level but I think mixing return action priorities with filter
> flags could be confusing and inflexible in the long run since filters
> are inherited and your parent's desire to kill the entire thread group
> may not mix with your desire to only kill a single thread.

Blocking the use of SECCOMP_FILTER_FLAG_KILL_PROCESS just means a
child can never perform a KILL_PROCESS, which doesn't really make much
sense, IMO.

The trouble may be that KILL_PROCESS would be used sparingly by either
parent or child, in the sense that maybe "unknown syscall gets
KILL_PROCESS, but 'connect' should just do KILL_THREAD". Or the
reverse. There isn't a way to mix combinations of return values across
filter chains without treating it exactly like a "real"
SECCOMP_RET_KILL_PROCESS would have worked. That means I have to treat
it as "higher priority" in the seccomp_run_filters() loop (which is
luckily very very cheap, as the "unlikely(register == zero)" test is
correct branch-predicted for the non-zero case, and the test is cheap
(we've already done the assignment which we need for the "<" test
below it, so it's a single pipelined instruction for the zero flag).

I don't expect to adjust KILL_THREAD vs KILL_PROCESS ever again, so
I'm not too worried about inflexibility.

What I don't get in this version is a _single_ filter being able to
distinguish between KILL_THREAD and KILL_PROCESS. Userspace is forced
to split up a rule if it wants to have different results. Also, parent
_can_ stop a child from escalating their KILL_THREADs to KILL_PROCESS
via the filter you mentioned, which is weird.

I spent some time trying to use the high bit in the return, to make
this signed, and in the end it was much much more ugly, and I didn't
want to deal with the fallout to userspace which may suddenly have to
deal with unexpected bits in the BPF return:

basically s/u32/s32/ in __seccomp_filter() and seccomp_run_filters().
add #define SECCOMP_RET_ACTION_FULL 0xffff0000
add #define SECCOMP_RET_KILL_PROC 0x80000000

Then use SECCOMP_RET_ACTION_FULL to mask everything (after forcing a u32 cast).

But the more I stare at this, the more I just want a value that that
works correctly without totally crazy flags and things.

> Another way that this doesn't mix perfectly with the existing design is
> when the action is unknown. In that situation, we treat it as RET_KILL.
> However, this patch hard-codes the comparison with RET_KILL so we get
> into this situation where an unknown action is treated as RET_KILL
> except when the filter has the FILTER_FLAG_KILL_PROCESS flag set and
> then this short-circuit doesn't kick in. It is a corner case, for sure,
> but worth mentioning.

Hm, yeah, good point. This leaves unknown returns as KILL_THREAD, not
KILL_PROCESS.

Let me spent some more time looking at the high bit version of this...

-Kees

--
Kees Cook
Pixel Security

Next message: Colin King: "[PATCH][next] Bluetooth: kfree tmp rather than an alias to it"
Previous message: Matthias Kaehlcke: "Re: [PATCH v2] drm/i915: Return correct EDP voltage swing table for 0.85V"
In reply to: Tyler Hicks: "Re: [PATCH v3 2/4] seccomp: Add SECCOMP_FILTER_FLAG_KILL_PROCESS"
Next in thread: Kees Cook: "[PATCH v3 4/4] selftests/seccomp: Test thread vs process killing"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]