[RFC PATCH resend 0/6] mm and ptrace: Track dumpability until task is freed

From: Jann Horn
Date: Fri Oct 16 2020 - 19:09:42 EST


[sorry, had to resend - it was pointed out to me that when I sent
this series the first time, DKIM got broken by the kvack list
rewriting 8-bit into quoted-printable]

At the moment, there is a lifetime issue (no, not the UAF kind) around
__ptrace_may_access():

__ptrace_may_access() wants to check mm->flags and mm->user_ns to figure
out whether the caller should be allowed to access some target task.
__ptrace_may_access() can be called as long as __put_task_struct()
hasn't happened yet; but __put_task_struct() happens when the task is
about to be freed, which is much later than exit_mm() (which happens
pretty early during task exit).
So we can have a situation where we need to consult the mm for a
security check, but we don't have an mm anymore.

At the moment, this is solved by failing open: If the mm is gone, we
pretend that it was dumpable. That's dubious from a security
perspective - as one example, we drop the mm_struct before the file
descriptor table, so someone might be able to steal file descriptors
from an exiting tasks when dumpability was supposed to prevent that.

The easy fix would be to let __ptrace_may_access() instead always refuse
access to tasks that have lost their mm; but then that would e.g. mean
that the ability to inspect dead tasks in procfs would be restricted.
So while that might work in practice, it'd be a bit ugly, too.

Another option would be to move the dumpability information elsewhere -
but that would have to be the task_struct (the signal_struct can be
shared with dead pre-execve threads, so we can't use it here). So we'd
have to keep dumpability information in sync across threads - that'd
probably be pretty ugly.


So I think the proper fix is to let the task_struct hold a reference on
the mm_struct until the task goes away completely. This is implemented
in patch 1/6, which is also the only patch in this series that I
actually care about (and the only one with a stable backport marking);
the rest of the series are some tweaks in case people dislike the idea
of constantly freeing mm_structs from workqueue context.
Those tweaks should also reduce the memory usage of dead tasks, by
ensuring that they don't keep their PGDs alive.


Patch 1/6 is not particularly pretty, but I can't think of any better
way to do it.

So: Does this series (and in particular patch 1/6) look vaguely sane?
And if not, does anyone have a better approach?


Jann Horn (6):
ptrace: Keep mm around after exit_mm() for __ptrace_may_access()
refcount: Move refcount_t definition into linux/types.h
mm: Add refcount for preserving mm_struct without pgd
mm, oom: Use mm_ref()/mm_unref() and avoid mmdrop_async()
ptrace: Use mm_ref() for ->exit_mm
mm: remove now-unused mmdrop_async()

arch/x86/kernel/tboot.c | 2 +
drivers/firmware/efi/efi.c | 2 +
include/linux/mm_types.h | 15 ++++++-
include/linux/refcount.h | 13 +-----
include/linux/sched.h | 8 ++++
include/linux/sched/mm.h | 13 ++++++
include/linux/types.h | 12 +++++
kernel/exit.c | 2 +
kernel/fork.c | 90 +++++++++++++++++---------------------
kernel/ptrace.c | 10 +++++
mm/init-mm.c | 2 +
mm/oom_kill.c | 2 +-
12 files changed, 105 insertions(+), 66 deletions(-)


base-commit: bbf5c979011a099af5dc76498918ed7df445635b
--
2.29.0.rc1.297.gfa9743e501-goog