Re: [PATCH 5/7] seccomp_filter: Document what seccomp_filter is andhow it works.

From: Randy Dunlap
Date: Thu Apr 28 2011 - 11:46:29 EST


On Wed, 27 Apr 2011 22:08:49 -0500 Will Drewry wrote:

> Adds a text file covering what CONFIG_SECCOMP_FILTER is, how it is
> implemented presently, and what it may be used for. In addition,
> the limitations and caveats of the proposed implementation are
> included.
>
> Signed-off-by: Will Drewry <wad@xxxxxxxxxxxx>
> ---
> Documentation/trace/seccomp_filter.txt | 75 ++++++++++++++++++++++++++++++++
> 1 files changed, 75 insertions(+), 0 deletions(-)
> create mode 100644 Documentation/trace/seccomp_filter.txt
>
> diff --git a/Documentation/trace/seccomp_filter.txt b/Documentation/trace/seccomp_filter.txt
> new file mode 100644
> index 0000000..6a0fd33
> --- /dev/null
> +++ b/Documentation/trace/seccomp_filter.txt
> @@ -0,0 +1,75 @@
> + Seccomp filtering
> + =================
> +
> +Introduction
> +------------
> +
> +A large number of system calls are exposed to every userland process
> +with many of them going unused for the entire lifetime of the
> +application. As system calls change and mature, bugs are found and
> +quashed. A certain subset of userland applications benefit by having
> +a reduce set of available system calls. The reduced set reduces the

reduced

> +total kernel surface exposed to the application. System call filtering
> +is meant for use with those applications.
> +
> +The implementation currently leverages both the existing seccomp
> +infrastructure and the kernel tracing infrastructure. By centralizing
> +hooks for attack surface reduction in seccomp, it is possible to assure
> +attention to security that is less relevant in normal ftrace scenarios,
> +such as time of check, time of use attacks. However, ftrace provides a
> +rich, human-friendly environment for specifying system calls by name and
> +expected arguments. (As such, this requires FTRACE_SYSCALLS.)
> +
> +
> +What it isn't
> +-------------
> +
> +System call filtering isn't a sandbox. It provides a clearly defined
> +mechanism for minimizing the exposed kernel surface. Beyond that, policy for
> +logical behavior and information flow should be managed with an LSM of your
> +choosing.
> +
> +
> +Usage
> +-----
> +
> +An additional seccomp mode is exposed through mode '2'. This mode
> +depends on CONFIG_SECCOMP_FILTER which in turn depends on
> +CONFIG_FTRACE_SYSCALLS.
> +
> +A collection of filters may be supplied via prctl, and the current set of
> +filters is exposed in /proc/<pid>/seccomp_filter.
> +
> +For instance,
> + const char filters[] =
> + "sys_read: (fd == 1) || (fd == 2)\n"
> + "sys_write: (fd == 0)\n"
> + "sys_exit: 1\n"
> + "sys_exit_group: 1\n"
> + "on_next_syscall: 1";
> + prctl(PR_SET_SECCOMP, 2, filters);
> +
> +This will setup system call filters for read, write, and exit where reading can
> +be done only from fds 1 and 2 and writing to fd 0. The "on_next_syscall" directive tells
> +seccomp to not enforce the ruleset until after the next system call is run. This allows
> +for launchers to apply system call filters to a binary before executing it.
> +
> +Once enabled, the access may only be reduced. For example, a set of filters may be:
> +
> + sys_read: 1
> + sys_write: 1
> + sys_mmap: 1
> + sys_prctl: 1
> +
> +Then it may call the following to drop mmap access:
> + prctl(PR_SET_SECCOMP, 2, "sys_mmap: 0");
> +
> +
> +Caveats
> +-------
> +
> +The system call names come from ftrace events. At present, many system
> +calls are not hooked - such as x86's ptregs wrapped system calls.
> +
> +In addition compat_task()s will not be supported until a sys32s begin
> +being hooked.

Last sentence is hard to read IMO:
a. what are compat_task()s?
b. what is a sys32s begin?
c. awkward wording, maybe change to: until a sys32s begin has been hooked.


thanks,
---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/