[PATCH v1 0/7] Remove in-tree usage of MAP_DENYWRITE

From: David Hildenbrand
Date: Thu Aug 12 2021 - 04:44:27 EST

This series is based on v5.14-rc5 and corresponds code-wise to the
previously sent RFC [1] (the RFC still applied cleanly).

This series removes all in-tree usage of MAP_DENYWRITE from the kernel
and removes VM_DENYWRITE. We stopped supporting MAP_DENYWRITE for
user space applications a while ago because of the chance for DoS.
The last renaming user is binfmt binary loading during exec and
legacy library loading via uselib().

With this change, MAP_DENYWRITE is effectively ignored throughout the
kernel. Although the net change is small, I think the cleanup in mmap()
is quite nice.

There are some (minor) user-visible changes with this series:
1. We no longer deny write access to shared libaries loaded via legacy
uselib(); this behavior matches modern user space e.g., via dlopen().
2. We no longer deny write access to the elf interpreter after exec
completed, treating it just like shared libraries (which it often is).
3. We always deny write access to the file linked via /proc/pid/exe:
sys_prctl(PR_SET_MM_EXE_FILE) will fail if write access to the file
cannot be denied, and write access to the file will remain denied
until the link is effectivel gone (exec, termination,
PR_SET_MM_EXE_FILE) -- just as if exec'ing the file.

I was wondering if we really care about permanently disabling write access
to the executable, or if it would be good enough to just disable write
access while loading the new executable during exec; but I don't know
the history of that -- and it somewhat makes sense to deny write access
at least to the main executable. With modern user space -- dlopen() -- we
can effectively modify the content of shared libraries while being used.

There is a related problem [2] with overlayfs, that should at least partly
be tackled by this series. I don't quite understand the interaction of
overlayfs and deny_write_access()/allow_write_access() at exec time:

If we end up denying write access to the wrong file and not to the
realfile, that would be fundamentally broken. We would have to reroute
our deny_write_access()/ allow_write_access() calls for the exec file to
the realfile -- but I leave figuring out the details to overlayfs guys, as
that would be a related but different issue.

RFC -> v1:
- "binfmt: remove in-tree usage of MAP_DENYWRITE"
-- Add a note that this should fix part of a problem with overlayfs

[1] https://lore.kernel.org/r/20210423131640.20080-1-david@xxxxxxxxxx/
[2] https://lore.kernel.org/r/YNHXzBgzRrZu1MrD@xxxxxxxxxxxxxxxxxxxxxxxxx/

David Hildenbrand (7):
binfmt: don't use MAP_DENYWRITE when loading shared libraries via
kernel/fork: factor out atomcially replacing the current MM exe_file
kernel/fork: always deny write access to current MM exe_file
binfmt: remove in-tree usage of MAP_DENYWRITE
mm: remove VM_DENYWRITE
mm: ignore MAP_DENYWRITE in ksys_mmap_pgoff()
fs: update documentation of get_write_access() and friends

arch/x86/ia32/ia32_aout.c | 8 ++--
fs/binfmt_aout.c | 7 ++--
fs/binfmt_elf.c | 6 +--
fs/binfmt_elf_fdpic.c | 2 +-
fs/proc/task_mmu.c | 1 -
include/linux/fs.h | 19 +++++----
include/linux/mm.h | 3 +-
include/linux/mman.h | 4 +-
include/trace/events/mmflags.h | 1 -
kernel/events/core.c | 2 -
kernel/fork.c | 75 ++++++++++++++++++++++++++++++----
kernel/sys.c | 33 +--------------
lib/test_printf.c | 5 +--
mm/mmap.c | 29 ++-----------
mm/nommu.c | 2 -
15 files changed, 98 insertions(+), 99 deletions(-)

base-commit: 36a21d51725af2ce0700c6ebcb6b9594aac658a6