Re: eBPF / seccomp globals?

From: Andy Lutomirski
Date: Tue Sep 29 2015 - 20:08:03 EST


On Tue, Sep 29, 2015 at 4:44 PM, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
> On Thu, Sep 10, 2015 at 2:55 PM, Michael Tirado <mtirado418@xxxxxxxxx> wrote:
>> On Fri, Sep 4, 2015 at 8:37 PM, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
>>>
>>> Do you still need file capabilities with the availability of the new
>>> ambient capabilities?
>>>
>>> https://s3hh.wordpress.com/2015/07/25/ambient-capabilities/
>>> http://thread.gmane.org/gmane.linux.kernel.lsm/24034
>>
>> Ah.. thanks for the info on this, my launcher program could use ambient
>> capabilities if whoever invoked it already has that capability. I am trying
>> to have the new environment explicitly defined as a white list, and avoid
>> any type of privilege escalation not already granted by root user either
>> by filesystem mechanisms (setuid / file caps) or inheritable caps.
>>
>> I would still like to be able to launch programs with file capabilities since we
>> can lock those down with capability bounding set, and maybe even setuid
>> binaries too (with a hefty warning message). This rules out LD_PRELOAD
>> for me, and also some linkers may not support it at all.
>>
>>
>>
>>> On the TODO list is
>>> doing deep argument inspection, but it is not an easy thing to get
>>> right. :)
>>
>> Yes, please do not rush such a thing!! It might even be a can of worms
>> not worth opening.
>>
>>
>>
>> In case anyone is wondering what I am doing for-now(tm) while waiting for
>> eBPF map support, or some other way to deal with this problem: I have crafted
>> a very hacky patch to work around the issue that will allow 2 system calls to
>> pass through before the filter program is run. I'm lazily using google
>> webmail so,
>> sorry if the tabs are missing :(
>
> I
>
>>
>>
>>
>> From: Michael R. Tirado <mtirado418@xxxxxxxxx>
>> Date: Thu, 10 Sep 2015 08:28:41 +0000
>> Subject: [PATCH] Add new seccomp filter mode + flag to allow two syscalls to
>> pass before the filter is run. allows a launcher program to setuid(drop caps)
>> and exec if those two privileges are not granted in seccomp filter whitelist.
>>
>> DISCLAIMER:
>> I am doing this as a quick temporary workaround to this complex problem.
>> Also, there may be a more efficient way to implement it instead of
>> branching in the filter loop.
>> ---
>> include/linux/seccomp.h | 2 +-
>> include/uapi/linux/seccomp.h | 2 ++
>> kernel/seccomp.c | 23 ++++++++++++++++++++---
>> 3 files changed, 23 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
>> index a19ddac..5547448c 100644
>> --- a/include/linux/seccomp.h
>> +++ b/include/linux/seccomp.h
>> @@ -3,7 +3,7 @@
>>
>> #include <uapi/linux/seccomp.h>
>>
>> -#define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC)
>> +#define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC |
>> SECCOMP_FILTER_FLAG_DEFERRED)
>>
>> #ifdef CONFIG_SECCOMP
>>
>> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
>> index 0f238a4..43a8fb8 100644
>> --- a/include/uapi/linux/seccomp.h
>> +++ b/include/uapi/linux/seccomp.h
>> @@ -9,6 +9,7 @@
>> #define SECCOMP_MODE_DISABLED 0 /* seccomp is not in use. */
>> #define SECCOMP_MODE_STRICT 1 /* uses hard-coded filter. */
>> #define SECCOMP_MODE_FILTER 2 /* uses user-supplied filter. */
>> +#define SECCOMP_MODE_FILTER_DEFERRED 3 /* sets filter mode + deferred flag */
>>
>> /* Valid operations for seccomp syscall. */
>> #define SECCOMP_SET_MODE_STRICT 0
>> @@ -16,6 +17,7 @@
>>
>> /* Valid flags for SECCOMP_SET_MODE_FILTER */
>> #define SECCOMP_FILTER_FLAG_TSYNC 1
>> +#define SECCOMP_FILTER_FLAG_DEFERRED 2 /* grant two unfiltered syscalls */
>>
>> /*
>> * All BPF programs must return a 32-bit value.
>> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
>> index 245df6b..dc2a5af 100644
>> --- a/kernel/seccomp.c
>> +++ b/kernel/seccomp.c
>> @@ -58,6 +58,7 @@ struct seccomp_filter {
>> atomic_t usage;
>> struct seccomp_filter *prev;
>> struct bpf_prog *prog;
>> + unsigned int deferred;
>> };
>>
>> /* Limit any path through the tree to 256KB worth of instructions. */
>> @@ -196,7 +197,12 @@ static u32 seccomp_run_filters(struct seccomp_data *sd)
>> * value always takes priority (ignoring the DATA).
>> */
>> for (; f; f = f->prev) {
>> - u32 cur_ret = BPF_PROG_RUN(f->prog, (void *)sd);
>> + u32 cur_ret;
>> + if (unlikely(f->deferred)) {
>> + --f->deferred;
>> + continue;
>> + }
>> + cur_ret = BPF_PROG_RUN(f->prog, (void *)sd);
>
> I do like the idea of a deferred filters, but I wouldn't want them
> checked this way (it adds a fixed cost to all filters). I would rather
> that a separate list of filters exist, waiting for something like
> "exec" to trigger them getting added to the actual filter list. In
> fact, probably you'd want something like "prepare" and "cancel" too,
> then you could "prepare", fork, exec. Then the child would get the
> prepared filters added, and the parent could call "cancel" to drop the
> prepared filters from itself.
>
> Andy, Will, do you see issues with a class of "deferred" filters that
> would be attached after exec?

No security issue, but it's not beautiful. What prevents nasty races
if you allow fork()?

We could consider more generally allowing limited stateful filters
using eBPF. Then we'd change the state in exec using eBPF code.
Before we do any of the above, I think we need to nail down just what
the identity of a filter is, though. With eBPF, it's possible to
attach the same program to two tasks.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/