[PATCH v2 00/10] mm: remove vma_merge()

From: Lorenzo Stoakes
Date: Fri Aug 23 2024 - 16:14:41 EST


REVIEWERS NOTE:
This series is based on mm-unstable and rebased on Liam's series [0],
including the fix patch [1] sent for this. In order to review these
patches locally, if they are not already in mm-unstable, you will need to
apply those series before applying this one.

The infamous vma_merge() function has been the cause of a great deal of
pain, bugs and confusion for a very long time.

It is subtle, contains many corner cases, tries to do far too much and is
as a result very fragile.

The fact that the function requires there to be a numbering system to cover
each possible eventuality with references to each in the many branches of
its implementation as to which case you are looking at speaks to all this.

Some of this complexity is inherent - unfortunately there is no getting
away from the need to figure out precisely how to execute the merge,
whether we need to remove VMAs, whether it is safe to do so, what
constitutes a mergeable VMA and so on.

However, a lot of the complexity is not inherent but instead a product of
the function's 'organic' development.

Liam has gone to great lengths to improve the situation as a part of his
maple tree implementation, greatly improving the readability of the code,
and Vlastimil and myself have additionally gone to lengths to try to
improve things further.

However, with the availability of userland VMA testing, it now becomes
possible to perform a rather more significant refactoring while maintaining
confidence in its correct operation.

An attempt was previously made by Vlastimil [2] to eliminate vma_merge(),
however it was rather - brutal - and an astute reader might refer to the
date of that patch for insight as to its intent.

This series instead divides merge operations into two natural kinds -
merges which occur when a NEW vma is being added to the address space, and
merges which occur when a vma is being MODIFIED.

Happily, the vma_expand() function introduced by Liam, which has the
capacity for also deleting a subsequent VMA, covers each of the NEW vma
cases.

By abstracting the actual final commit of changes to a VMA to its own
function, commit_merge() and writing a wrapper around vma_expand() for new
VMA cases vma_merge_new_range(), we can avoid having to use vma_merge() for
these instances altogether.

By doing so we are also able to then de-duplicate all existing merge logic
in mmap_region() and do_brk_flags() and have everything invoke this new
function, so we universally take the same approach to merging new VMAs.

Having done so, we can then completely rework vma_merge() into
vma_merge_existing_range() and use this for the instances where a merge is
proposed for a region of an existing VMA.

This eliminates vma_merge() and its numbered cases and instead divides
things into logical cases - merge both, merge left, merge right (the latter
2 being either partial or full merges).

The code is heavily annotated with ASCII diagrams and greatly simplified in
comparison to the existing vma_merge() function.

Having made this change, we take the opportunity to address an issue with
merging VMAs possessing a vm_ops->close() hook - commit 714965ca8252
("mm/mmap: start distinguishing if vma can be removed in mergeability
test") and commit fc0c8f9089c2 ("mm, mmap: fix vma_merge() case 7 with
vma_ops->close") make efforts to relax how we handle these, making
assumptions about which VMAs might end up deleted (and thus, if possessing
a vm_ops->close() hook, cannot be).

This refactor means we do not need to guess, so instead explicitly only
disallow merge in instances where a VMA with a vm_ops->close() hook would
be deleted (and try a smaller merge in cases where this is possible).

In addition to these changes, we introduce a new vma_merge_struct
abstraction to allow VMA merge state to be threaded through the operation
neatly.

There is heavy unit testing provided for all merge functionality, added
prior to the refactoring, allowing for before/after testing.

The vm_ops->close() change also introduces exhaustive testing to
demonstrate that this functions as expected, and in addition to this the
reproduction code from commit fc0c8f9089c2 ("mm, mmap: fix vma_merge() case
7 with vma_ops->close") was tested and confirmed passing.

[0]:https://lore.kernel.org/all/20240822192543.3359552-1-Liam.Howlett@xxxxxxxxxx
[1]:https://lore.kernel.org/all/20240823133034.3527917-1-Liam.Howlett@xxxxxxxxxx
[2]:https://lore.kernel.org/linux-mm/20240401192623.18575-2-vbabka@xxxxxxx/

v2:
* Updated tests to function without the vmg change, and moved earlier in
series so we can test against the code _exactly_ as it was previously.
* Added vmg->mm to store mm_struct and avoid hacky container_of() in
vma_merge() prior to refactor. It's logical to thread this through.
* Stopped specifying vmg->vma for vma_merge_new_vma() from the start,
which was previously removed later in the series.
* Improve vma_modify_flags() to be better formatted for a large number of
flags.
* Removed if (vma) { ... } logic in mmap_region() and integrated the
approach from a later commit of putting logic into the if (next &&... )
block. Improved comment about why we are doing this.
* Introduced VMG_STATE() and VMG_VMA_STATE() macros and use these to avoid
duplication of initialisation of vmg state.
* Expanded the commit message for abstracting the policy comparison to
explain the logic.
* Reverted the use of vmg in vma_shrink() and split_vma().
* Reverted the cleanup of __split_vma() int -> bool as at this point fully
irrelevant to series.
* Reinstated incorrectly removed vmg.uffd_ctx assignment in mmap_region().
* Removed a confusing comment about assignment of vmg.end in early version
of mmap_region().
* Renamed vma_merge_new_vma() to vma_merge_new_range() and
vma_merge_modified() to vma_merge_existing_range(). This makes it clearer
what we're attempting to do.
* Stopped setting vmg parameters in do_brk_flags() that we did not set in
the original implementation, i.e. vma parameters for things like
anon_vma, uffd context, etc. which in the original implementation are not
checked in can_vma_merge_after().
* Moved VM_SPECIAL maple tree rewalk out of if (!prev && !next) { ... }
block in vma_merge_new_range() (which was changed to !next anyway). This
should always be done in the VM_SPECIAL case if vmg->prev is specified.
* Updated vma_merge_new_range() to correct the case where prev, next could
be merged individually with the proposed range, however not
together.
* Update vma_merge_new_range() to require that the caller sets prev and
next. This simplifies the logic and avoids unnecessary maple tree walks.
* Updated mmap_region() to update vmg->flags from vma->vm_flags on merge
reattempt.
* Updated callers of vma_merge_new_range() to ensure we always point the
iterator at prev if it exists.
* Added new state field to vmg to allow for errors to be returned.
* Adjusted do_brk_flags() to read vmg->state and handle memory allocation
failures accordingly.
* Do not double-assign VM_SOFTDIRTY in do_brk_flags().
* Separated out move of vma_prepare(), init_vma_prep(), vma_complete(),
can_vma_merge_before(), can_vma_merge_after() functions to separate
commit.
* Adjusted commit_merge() change to initially _only_ have parameters
relevant to vma_expand() to make review easier.
* Reinstated 'vma iterator must be pointing to start' comment in
commit_merge().
* Adjusted commit_merge() again when introducing vma_merge_existing_range()
to accept parameters specific to existing range merges.
* Removed unnecessary abstraction of vmg->end in vma_merge_existing_range()
as only used once.
* Abstract expanded parameter to local variable for clarity in
vma_merge_existing_range().
* Unlink anon_vma objects if VMA pre-allocation fails on commit_merge() in
vma_merge_existing_range() if any were duplicated. This was incorrectly
excluded from the refactor.
* Moved comment from close commit regarding merge_will_delete_both to
previous commit as unchanged behaviour.
* Corrected failure to assign vmg->flags after applying VM_ACCOUNT in
map_region() (this had caused a ~5% regression in do_brk_flags()
incidentally, now resolved).
* Added vmi assumptions and asserts in merge functions.
* Added lock asserts in merge functions.
* Added an assert to vma_merge_new_range() to ensure no VMA within
[vmg->start, vmg->end).
* Added additional comments describing why we are moving the iterator to
avoid maple tree re-walks.
* Added new test for the case of prev, next both with vm_ops->close()
adding a new VMA, which should result in prev being expanded but NOT
merged with next.
* Adjusted test code to do a mock version of anon_vma duplication, and
cleanup after itself.
* Adjusted test code to allow vma preallocation to fail so we can test
how we handle this.
* Added a test to assert correct anon_vma duplication behaviour.
* Added a test to assert that preallocation failure results in anon_vma's
being unlinked.
* Corrected vma_expand() assumption - we need vma, next not prev.
* Reinstated removed VM_WARN_ON() around vp.anon_vma state in
commit_merge().
* Rebased over Pedro + Liam's changes.
* Updated test logic to handle current->{mm,pid,comm} fields after rebase
on Liam's changes which use these. Also added stub for pr_warn_once() for
the same reason.
* Adjusted logic fundamentals based on rebase - vma_merge_new_range() now
assumes vmi is pointing at the gap...

v1:
https://lore.kernel.org/linux-mm/cover.1722849859.git.lorenzo.stoakes@xxxxxxxxxx/

Lorenzo Stoakes (10):
tools: improve vma test Makefile
tools: add VMA merge tests
mm: introduce vma_merge_struct and abstract vma_merge(),vma_modify()
mm: remove duplicated open-coded VMA policy check
mm: abstract vma_expand() to use vma_merge_struct
mm: avoid using vma_merge() for new VMAs
mm: make vma_prepare() and friends static and internal to vma.c
mm: introduce commit_merge(), abstracting final commit of merge
mm: refactor vma_merge() into modify-only vma_merge_existing_range()
mm: rework vm_ops->close() handling on VMA merge

mm/mmap.c | 106 +--
mm/vma.c | 1297 ++++++++++++++++-------------
mm/vma.h | 152 ++--
tools/testing/vma/Makefile | 6 +-
tools/testing/vma/vma.c | 1302 +++++++++++++++++++++++++++++-
tools/testing/vma/vma_internal.h | 51 +-
6 files changed, 2217 insertions(+), 697 deletions(-)

--
2.46.0