Re: Seccomp questions for updates to seccomp(2) man page

From: Michael Kerrisk (man-pages)
Date: Sat Sep 05 2015 - 03:02:12 EST

Hi Kees,

On 08/27/2015 06:32 AM, Kees Cook wrote:
> On Wed, Aug 26, 2015 at 6:42 PM, Michael Kerrisk (man-pages)
> <mtk.manpages@xxxxxxxxx> wrote:
>> Hello Kees, Will,
>> In recent times I've been asked a couple of questions about seccomp(),
>> and it seems like it would be worthwhile to include these topics in
>> the seccomp(2) man page. Would you be able to help out with some
>> answers?
>> === Use of the instruction pointer in seccomp filters ===
>> The seccomp_data describing the system call includes the process's
>> instruction pointer value. What use can be made of this information?
> Will may have some other history to add here, but it seemed like it
> was a handy thing to add, as it's a dynamic value attached to the
> execution environment. I'm actually not aware of any programs that
> build filters with reference to it.
>> My best guess is that you can use this information in conjunction with
>> /proc/PID/maps to introspect the process layout and thus construct
>> filters that conditionally operate based on which DSO is performing a
>> system call. Is that a reasonable use case? Are there others?
> That's reasonable. Filters limiting syscalls to certain memory ranges
> would likely also want to lock down mmap and mprotect calls, to stop
> anything malicious from trying to sneak into the protected range.

Thanks. I've added this text to the page:

The instruction_pointer field provides the address of the
machine-language instruction that performed the system call.
This might be useful in conjunction with the use of
/proc/[pid]/maps to perform checks based on which region (mapâ
ping) of the program made the system call. (Probably, it is wise
to lock down the mmap(2) and mprotect(2) system calls to prevent
the program from subverting such checks.)

>> === Chained seccomp filters and SECCOMP_RET_KILL ===
>> The man page describes the behavior when multiple filter are installed
>> If multiple filters exist, they are all executed, in reverse
>> order of their addition to the filter tree (i.e., the most
>> recently installed filter is executed first). The return value
>> for the evaluation of a given system call is the first-seen
>> SECCOMP_RET_ACTION value of highest precedence (along with its
>> accompanying data) returned by execution of all of the filters.
>> The question is: suppose one of the early filters returns
>> SECCOMP_RET_KILL (which is the highest priority action), what is the
>> purpose of executing the remaining filters. My best guess is that this
>> about preventing the user from discovering which filter rule causes
>> the sandboxed program to fail. Is this correct, or is there another
>> reason?
> It's just because it would be an optimization that would only speed up
> the RET_KILL case, but it's the uncommon one and the one that doesn't
> benefit meaningfully from such a change (you need to kill the process
> really quickly?). We would speed up killing a program at the (albeit
> tiny) expense to all other filtered programs. Best to keep the filter
> execution logic clear, simple, and as fast as possible for all
> filters.

Ahh -- that makes sense. Perhaps it is excessive, but I've noted this
in the page, since I've run across people puzzled by this behavior,
and I recall myself being puzzled about it when I noticed it as well:

(Note that all filters will be called even if one of the
earlier filters returns SECCOMP_RET_KILL. This is done to
simplify the kernel code and to provide a tiny speed-up in
the execution of sets of filters by avoiding a check for
this uncommon case.)



Michael Kerrisk
Linux man-pages maintainer;
Linux/UNIX System Programming Training:
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at