Re: seccomp: dump core when using SECCOMP_RET_KILL

From: Andy Lutomirski
Date: Tue Jan 31 2017 - 17:00:59 EST

On Fri, Jan 27, 2017 at 1:48 PM, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
> On Wed, Jan 25, 2017 at 12:05 PM, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
>> On Tue, Jan 24, 2017 at 4:53 PM, Andrei Vagin <avagin@xxxxxxxxxxxxx> wrote:
>>> Hi,
>>> One of CRIU tests fails with this patch:
>>> Before this patch only a thread which called a "wrong" syscall is killed.
>>> Now a whole process is killed if one of threads called a "wrong" syscall.
>> Oh ew. I wonder what is causing this? In other do_coredump() callers,
>> they explicitly call do_group_exit(). Hmmm
> We need to find a way to fix this or remove the coredump change from
> -next. We have a few things that have come up recently (coincident
> with the coredump change):
> - some folks would like seccomp kills to kill the entire process not
> just the thread
> - on a full-process kill, there needs to be a way to get a coredump
> - on a kill, it would be nice to have reliable logging
> Getting a coredump requires a full-process kill. It is possible to do
> this already with RET_TRAP and just not catch the SIGSYS. However,
> this isn't sufficient if you want to be _sure_ the entire process gets
> killed since RET_TRAP depends on cooperation from the process.
> Getting reliable logging out of seccomp for non-RET_KILL is
> non-trivial because syscall-audit doesn't track forks.
> The RET_* values are part of the UAPI, so changing or adding to them
> requires care.
> Right now we have very little room in the RET_* values (the lower
> bytes are for the RET_DATA which is ignored for RET_KILL and the
> semantics of changing that is very difficult):
> #define SECCOMP_RET_KILL 0x00000000U /* kill the task immediately */
> #define SECCOMP_RET_TRAP 0x00030000U /* disallow and force a SIGSYS */
> Killing the entire process is more aggressive than RET_KILL currently,
> so the question becomes, should we upgrade RET_KILL to
> RET_KILL_PROCESS and add RET_KILL_THREAD? Are there people that WANT
> only a thread to be killed? Andrei, does CRIU depend on this behavior,
> or is it "just" a regression test detail?

I think we actually have more flexibility: we have all the low bits --
they're currently ignored. We could plausibly have one code for "kill
this thread, no dump", one code for "kill this process, dump core",
and maybe one code for "kill this process, don't dump core".

I would be a bit surprised if anyone uses codes between 0x1 and 0xffff
inclusive, so changing the semantics of those codes ought to be safe.
(Or we could gate them behind a flag.) I'm not sure how I feel about
changing the semantics of 0.