Re: eBPF / seccomp globals?

From: Michael Tirado
Date: Thu Sep 10 2015 - 17:55:59 EST


On Fri, Sep 4, 2015 at 8:37 PM, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
>
> Do you still need file capabilities with the availability of the new
> ambient capabilities?
>
> https://s3hh.wordpress.com/2015/07/25/ambient-capabilities/
> http://thread.gmane.org/gmane.linux.kernel.lsm/24034

Ah.. thanks for the info on this, my launcher program could use ambient
capabilities if whoever invoked it already has that capability. I am trying
to have the new environment explicitly defined as a white list, and avoid
any type of privilege escalation not already granted by root user either
by filesystem mechanisms (setuid / file caps) or inheritable caps.

I would still like to be able to launch programs with file capabilities since we
can lock those down with capability bounding set, and maybe even setuid
binaries too (with a hefty warning message). This rules out LD_PRELOAD
for me, and also some linkers may not support it at all.



> On the TODO list is
> doing deep argument inspection, but it is not an easy thing to get
> right. :)

Yes, please do not rush such a thing!! It might even be a can of worms
not worth opening.



In case anyone is wondering what I am doing for-now(tm) while waiting for
eBPF map support, or some other way to deal with this problem: I have crafted
a very hacky patch to work around the issue that will allow 2 system calls to
pass through before the filter program is run. I'm lazily using google
webmail so,
sorry if the tabs are missing :(



From: Michael R. Tirado <mtirado418@xxxxxxxxx>
Date: Thu, 10 Sep 2015 08:28:41 +0000
Subject: [PATCH] Add new seccomp filter mode + flag to allow two syscalls to
pass before the filter is run. allows a launcher program to setuid(drop caps)
and exec if those two privileges are not granted in seccomp filter whitelist.

DISCLAIMER:
I am doing this as a quick temporary workaround to this complex problem.
Also, there may be a more efficient way to implement it instead of
branching in the filter loop.
---
include/linux/seccomp.h | 2 +-
include/uapi/linux/seccomp.h | 2 ++
kernel/seccomp.c | 23 ++++++++++++++++++++---
3 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index a19ddac..5547448c 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -3,7 +3,7 @@

#include <uapi/linux/seccomp.h>

-#define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC)
+#define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC |
SECCOMP_FILTER_FLAG_DEFERRED)

#ifdef CONFIG_SECCOMP

diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index 0f238a4..43a8fb8 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -9,6 +9,7 @@
#define SECCOMP_MODE_DISABLED 0 /* seccomp is not in use. */
#define SECCOMP_MODE_STRICT 1 /* uses hard-coded filter. */
#define SECCOMP_MODE_FILTER 2 /* uses user-supplied filter. */
+#define SECCOMP_MODE_FILTER_DEFERRED 3 /* sets filter mode + deferred flag */

/* Valid operations for seccomp syscall. */
#define SECCOMP_SET_MODE_STRICT 0
@@ -16,6 +17,7 @@

/* Valid flags for SECCOMP_SET_MODE_FILTER */
#define SECCOMP_FILTER_FLAG_TSYNC 1
+#define SECCOMP_FILTER_FLAG_DEFERRED 2 /* grant two unfiltered syscalls */

/*
* All BPF programs must return a 32-bit value.
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 245df6b..dc2a5af 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -58,6 +58,7 @@ struct seccomp_filter {
atomic_t usage;
struct seccomp_filter *prev;
struct bpf_prog *prog;
+ unsigned int deferred;
};

/* Limit any path through the tree to 256KB worth of instructions. */
@@ -196,7 +197,12 @@ static u32 seccomp_run_filters(struct seccomp_data *sd)
* value always takes priority (ignoring the DATA).
*/
for (; f; f = f->prev) {
- u32 cur_ret = BPF_PROG_RUN(f->prog, (void *)sd);
+ u32 cur_ret;
+ if (unlikely(f->deferred)) {
+ --f->deferred;
+ continue;
+ }
+ cur_ret = BPF_PROG_RUN(f->prog, (void *)sd);

if ((cur_ret & SECCOMP_RET_ACTION) < (ret & SECCOMP_RET_ACTION))
ret = cur_ret;
@@ -444,6 +450,14 @@ static long seccomp_attach_filter(unsigned int flags,
}

/*
+ * in certain cases we may wish to defer filtering, and allow some
+ * syscalls. eg, a launcher program will setuid(drop caps) then exec.
+ */
+ if (flags & SECCOMP_FILTER_FLAG_DEFERRED) {
+ filter->deferred = 2;
+ }
+
+ /*
* If there is an existing filter, make it the prev and don't drop its
* task reference.
*/
@@ -838,6 +852,7 @@ long prctl_set_seccomp(unsigned long seccomp_mode,
char __user *filter)
{
unsigned int op;
char __user *uargs;
+ unsigned int flags = 0;

switch (seccomp_mode) {
case SECCOMP_MODE_STRICT:
@@ -849,6 +864,9 @@ long prctl_set_seccomp(unsigned long seccomp_mode,
char __user *filter)
*/
uargs = NULL;
break;
+ /* set flag, older kernels lack seccomp syscall */
+ case SECCOMP_MODE_FILTER_DEFERRED:
+ flags = SECCOMP_FILTER_FLAG_DEFERRED;
case SECCOMP_MODE_FILTER:
op = SECCOMP_SET_MODE_FILTER;
uargs = filter;
@@ -857,6 +875,5 @@ long prctl_set_seccomp(unsigned long seccomp_mode,
char __user *filter)
return -EINVAL;
}

- /* prctl interface doesn't have flags, so they are always zero. */
- return do_seccomp(op, 0, uargs);
+ return do_seccomp(op, flags, uargs);
}
--
1.8.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/