[PATCH 14/14] mm/mmap: Change vma iteration order in do_vmi_align_munmap()

From: Liam R. Howlett
Date: Wed May 31 2023 - 22:18:08 EST


By delaying the setting of prev/next VMA until after the write of NULL,
the probability of the prev/next VMA already being in the CPU cache is
significantly increased, especially for larger munmap operations. It
also means that prev/next will be loaded closer to when they are used.

This has the consequence of needing to change the for_each() to a do {}
for_each() when writing to the side tree.

Since prev will be set later in the function, it is better to reverse
the splitting direction of the start VMA (modify the new_below argument
to __split_vma).

Using the vma_iter_prev_range() to walk back to the correct location in
the tree will, on the most part, mean walking within the CPU cache.
Usually, this is two steps vs a node reset and a tree re-walk.

Signed-off-by: Liam R. Howlett <Liam.Howlett@xxxxxxxxxx>
---
mm/mmap.c | 27 ++++++++++++---------------
1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index eaebcc8f60d2..429e314bd134 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2395,20 +2395,17 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
if (end < vma->vm_end && mm->map_count >= sysctl_max_map_count)
goto map_count_exceeded;

- error = __split_vma(vmi, vma, start, 0);
+ error = __split_vma(vmi, vma, start, 1);
if (error)
goto start_split_failed;
-
- vma = vma_iter_load(vmi);
}

- prev = vma_prev(vmi);
-
/*
* Detach a range of VMAs from the mm. Using next as a temp variable as
* it is always overwritten.
*/
- for_each_vma_range(*vmi, next, end) {
+ next = vma;
+ do {
/* Does it split the end? */
if (next->vm_end > end) {
error = __split_vma(vmi, next, end, 0);
@@ -2440,13 +2437,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
BUG_ON(next->vm_start < start);
BUG_ON(next->vm_start > end);
#endif
- }
-
- if (vma_iter_end(vmi) > end)
- next = vma_iter_load(vmi);
-
- if (!next)
- next = vma_next(vmi);
+ } for_each_vma_range(*vmi, next, end);

#if defined(CONFIG_DEBUG_VM_MAPLE_TREE)
/* Make sure no VMAs are about to be lost. */
@@ -2467,12 +2458,18 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
BUG_ON(count != test_count);
}
#endif
- /* Point of no return */
- vma_iter_set(vmi, start);
+ while (vma_iter_addr(vmi) > start)
+ vma_iter_prev_range(vmi);
+
if (vma_iter_clear_gfp(vmi, start, end, GFP_KERNEL))
return -ENOMEM;

mm->map_count -= count;
+ prev = vma_iter_prev_range(vmi);
+ next = vma_next(vmi);
+ if (next)
+ vma_iter_prev_range(vmi);
+
/*
* Do not downgrade mmap_lock if we are next to VM_GROWSDOWN or
* VM_GROWSUP VMA. Such VMAs can change their size under
--
2.39.2