[PATCH v4 00/21] Avoid MAP_FIXED gap exposure

From: Liam R. Howlett
Date: Wed Jul 10 2024 - 15:25:34 EST


It is now possible to walk the vma tree using the rcu read locks and is
beneficial to do so to reduce lock contention. Doing so while a
MAP_FIXED mapping is executing means that a reader may see a gap in the
vma tree that should never logically exist - and does not when using the
mmap lock in read mode. The temporal gap exists because mmap_region()
calls munmap() prior to installing the new mapping.

This patch set stops rcu readers from seeing the temporal gap by
splitting up the munmap() function into two parts. The first part
prepares the vma tree for modifications by doing the necessary splits
and tracks the vmas marked for removal in a side tree. The second part
completes the munmapping of the vmas after the vma tree has been
overwritten (either by a MAP_FIXED replacement vma or by a NULL in the
munmap() case).

Please note that rcu walkers will still be able to see a temporary state
of split vmas that may be in the process of being removed, but the
temporal gap will not be exposed. vma_start_write() are called on both
parts of the split vma, so this state is detectable.

RFC: https://lore.kernel.org/linux-mm/20240531163217.1584450-1-Liam.Howlett@xxxxxxxxxx/
v1: https://lore.kernel.org/linux-mm/20240611180200.711239-1-Liam.Howlett@xxxxxxxxxx/
v2: https://lore.kernel.org/all/20240625191145.3382793-1-Liam.Howlett@xxxxxxxxxx/
v3: https://lore.kernel.org/linux-mm/20240704182718.2653918-1-Liam.Howlett@xxxxxxxxxx/

Changes since v3:
- Completely removing arch_unmap() from the kernel. PPC doesn't need
it and no one else uses it.
- Relocated checks for mseal'ed vmas so it is only checked when
necessary.
- Remove do_vma_munmap() and use do_vmi_align_munmap() in its place
- Added inclusive/exclusive comments for start/end of munmap
- Added comments for unmap_start/unmap_end to specify it is for PTEs
- Renamed "cleared_ptes" to "clear_ptes" and reversed the logic so that
it is now a flag to indicate that the ptes need to be cleared vs it
was done.
- Set the "clear_ptes" flag after a successful vms_gather_munmap_vmas()
- Rename vms_complete_pte_clear() to vms_clear_ptes() since it may
happen before the completion of the vms in the case of a driver
mmap'ing in mmap_region().
- Fixed comment around vms_clear_ptes() in mmap_region().
- Call init_vma_munmap() unconditionally in the mmap_region() case so
that all defaults are set in the struct, which means
init_vma_munmap() must support a NULL vma.
- Use ULONG_MAX as the limit in abort_munmap_vmas() for clarity
- Added a comment highlighting that the free_pgtables() call may use a
different start/end based on if there was a prev/next vma
- Removed incorrect comment about VM_ACCOUNT and mremap's move_vma()
- Relocated to mas_store_gfp() call in vms_gather_munmap_vmas() so that
it is clear that the accounting is okay.
- Skip validate_mm() in do_vmi_align_munmap() on gather failure as
vms_gather_munmap_vmas() already validates.
- Added R-b from Lorenzo, Suren, and Kees - Thanks!


Liam R. Howlett (21):
mm/mmap: Correctly position vma_iterator in __split_vma()
mm/mmap: Introduce abort_munmap_vmas()
mm/mmap: Introduce vmi_complete_munmap_vmas()
mm/mmap: Extract the gathering of vmas from do_vmi_align_munmap()
mm/mmap: Introduce vma_munmap_struct for use in munmap operations
mm/mmap: Change munmap to use vma_munmap_struct() for accounting and
surrounding vmas
mm/mmap: Extract validate_mm() from vma_complete()
mm/mmap: Inline munmap operation in mmap_region()
mm/mmap: Expand mmap_region() munmap call
mm/mmap: Support vma == NULL in init_vma_munmap()
mm/mmap: Reposition vma iterator in mmap_region()
mm/mmap: Track start and end of munmap in vma_munmap_struct
mm/mmap: Clean up unmap_region() argument list
mm/mmap: Avoid zeroing vma tree in mmap_region()
mm/mmap: Use PHYS_PFN in mmap_region()
mm/mmap: Use vms accounted pages in mmap_region()
mm/mmap: Drop arch_unmap() call from all archs
mm/mmap: Move can_modify_mm() check down the stack
ipc/shm, mm: Drop do_vma_munmap()
mm/mmap: Move may_expand_vm() check in mmap_region()
mm/mmap: Drop incorrect comment from vms_gather_munmap_vmas()

arch/powerpc/include/asm/mmu_context.h | 9 -
arch/x86/include/asm/mmu_context.h | 5 -
include/asm-generic/mm_hooks.h | 11 +-
include/linux/mm.h | 6 +-
ipc/shm.c | 8 +-
mm/internal.h | 25 ++
mm/mmap.c | 545 ++++++++++++++-----------
7 files changed, 345 insertions(+), 264 deletions(-)

--
2.43.0