Re: seccomp: dump core when using SECCOMP_RET_KILL

From: Kees Cook
Date: Fri Jan 27 2017 - 16:48:41 EST

On Wed, Jan 25, 2017 at 12:05 PM, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
> On Tue, Jan 24, 2017 at 4:53 PM, Andrei Vagin <avagin@xxxxxxxxxxxxx> wrote:
>> Hi,
>> One of CRIU tests fails with this patch:
>> Before this patch only a thread which called a "wrong" syscall is killed.
>> Now a whole process is killed if one of threads called a "wrong" syscall.
> Oh ew. I wonder what is causing this? In other do_coredump() callers,
> they explicitly call do_group_exit(). Hmmm

We need to find a way to fix this or remove the coredump change from
-next. We have a few things that have come up recently (coincident
with the coredump change):

- some folks would like seccomp kills to kill the entire process not
just the thread
- on a full-process kill, there needs to be a way to get a coredump
- on a kill, it would be nice to have reliable logging

Getting a coredump requires a full-process kill. It is possible to do
this already with RET_TRAP and just not catch the SIGSYS. However,
this isn't sufficient if you want to be _sure_ the entire process gets
killed since RET_TRAP depends on cooperation from the process.

Getting reliable logging out of seccomp for non-RET_KILL is
non-trivial because syscall-audit doesn't track forks.

The RET_* values are part of the UAPI, so changing or adding to them
requires care.

Right now we have very little room in the RET_* values (the lower
bytes are for the RET_DATA which is ignored for RET_KILL and the
semantics of changing that is very difficult):

#define SECCOMP_RET_KILL 0x00000000U /* kill the task immediately */
#define SECCOMP_RET_TRAP 0x00030000U /* disallow and force a SIGSYS */

Killing the entire process is more aggressive than RET_KILL currently,
so the question becomes, should we upgrade RET_KILL to
RET_KILL_PROCESS and add RET_KILL_THREAD? Are there people that WANT
only a thread to be killed? Andrei, does CRIU depend on this behavior,
or is it "just" a regression test detail?

We can add two more RET_* values... however, it seems like only thread
vs process is a filter-level concept. Whether or not to dump core
seems to be an external aspect of the process (think ulimit -c), and
logging seems to be a system-level thing.

For logging, I think audit needs to grow fork-tracking, and/or have a
new "is under seccomp" test that can be exposed to auditctl. Then the
system owner can issue either "tell me about all seccomp kills" or
"tell me about seccomp kills in this process tree". As such, I don't
think we should be making filter-level changes to deal with the needs
of seccomp logging.

For coredumping, I remain a bit stumped. Strictly speaking,
coredumping exposes more kernel code to the running process, so it is
"less safe" than the instant kill RET_KILL performs. Though the
current patch is actually more aggressive in that it causes the entire
process to die as part of the coredumping.

I think that if there is a move to make RET_KILL kill the process,
then we can add coredumping. If not, I'm less sure how to proceed...


Kees Cook
Nexus Security