[PATCH 0/9] Refactoring exit

From: Eric W. Biederman
Date: Thu Jun 24 2021 - 14:58:44 EST



I dug into exit because PTRACE_EVENT_EXIT not being guaranteed to be
called with a stack where ptrace read and write all of the userspace
registers can lead to unfiltered reads and writes of kernel stack
contents.

While looking into it I realized that there are a lot of little races
between all of the ways an exit can be initiated. I don't know of a way
those races are harmful, but they make the code difficult to reason about.

The solution this set of changes adopts is to implement good primitives
for asynchronous exit and exit_group requests and modifies exit(2) and
exit_group(2) to use those primitives.

The result should be more consistent determination of the reason for an
exit, as well as PTRACE_EVENT_EXIT always being called from a context
(get_signal) where ptrace is guaranteed to be able to read and write
all of the registers.

I believe the set of changes could be justified for the cleanups alone
even if PTRACE_EVENT_EXIT did not need to be moved. Which makes me
feel good about this approach.

If a way can be found that coredumps can be started from complete_signal
(needed for timely handling of fatal signals) instead of needing to
start in do_coredump for proper synchronization force_siginfo_to_task
and get_signal can be significantly simplified. As it is a lot of
checks are duplicated to ensure that everything works properly in the
presence of do_coredump.

So far the code has been lightly tested, and the descriptions of some
of the patches are a bit light, but I think this shows the direction
I am aiming to travel for sorting out exit(2) and exit_group(2).

Eric W. Biederman (9):
signal/sh: Use force_sig(SIGKILL) instead of do_group_exit(SIGKILL)
signal/seccomp: Refactor seccomp signal and coredump generation
signal/seccomp: Dump core when there is only one live thread
signal: Factor start_group_exit out of complete_signal
signal/group_exit: Use start_group_exit in place of do_group_exit
signal: Fold do_group_exit into get_signal fixing io_uring threads
signal: Make individual tasks exiting a first class concept.
signal/task_exit: Use start_task_exit in place of do_exit
signal: Move PTRACE_EVENT_EXIT into get_signal

arch/sh/kernel/cpu/fpu.c | 10 +--
fs/exec.c | 10 ++-
include/linux/sched/jobctl.h | 2 +
include/linux/sched/signal.h | 5 ++
include/linux/sched/task.h | 1 -
kernel/exit.c | 41 ++---------
kernel/seccomp.c | 45 +++---------
kernel/signal.c | 166 ++++++++++++++++++++++++++++++-------------
8 files changed, 154 insertions(+), 126 deletions(-)