[PATCH v2 4/4] docs/mm: update document for split i_mmap tree
From: Huang Shijie
Date: Thu Jun 11 2026 - 02:24:03 EST
Document the i_mmap locking changes introduced by the following patches:
- Use mapping_mapped() to simplify the code
- Use get_i_mmap_root() to access the file's i_mmap
- Split the file's i_mmap tree (CONFIG_SPLIT_I_MMAP)
Add documentation for:
- CONFIG_SPLIT_I_MMAP split i_mmap tree architecture with per-tree locks
- New per-tree lock helpers: i_mmap_tree_lock_write/unlock_write
- New vm_area_struct.tree_idx field for sibling tree selection
- Updated i_mmap_lock_read/write semantics acquiring all per-tree locks
- Updated lock ordering notes for split tree configuration
- Updated page table freeing section for split tree scenario
Signed-off-by: Huang Shijie <huangsj@xxxxxxxx>
---
Documentation/mm/process_addrs.rst | 63 +++++++++++++++++++++++-------
1 file changed, 49 insertions(+), 14 deletions(-)
diff --git a/Documentation/mm/process_addrs.rst b/Documentation/mm/process_addrs.rst
index 851680ead45f..4aed3100b249 100644
--- a/Documentation/mm/process_addrs.rst
+++ b/Documentation/mm/process_addrs.rst
@@ -60,6 +60,15 @@ Terminology
:c:func:`!i_mmap_[try]lock_write` for file-backed memory. We refer to these
locks as the reverse mapping locks, or 'rmap locks' for brevity.
+ When :c:macro:`!CONFIG_SPLIT_I_MMAP` is enabled, the file-backed i_mmap tree
+ is split into multiple sibling trees (one per NUMA node or a number based on
+ CPU count), each with its own :c:type:`!struct i_mmap_tree` containing a
+ red/black interval tree and a :c:type:`!struct rw_semaphore`. In this
+ configuration, :c:func:`!i_mmap_lock_read` and :c:func:`!i_mmap_lock_write`
+ acquire all per-tree locks, while VMA insert/remove operations use the
+ per-tree granularity :c:func:`!i_mmap_tree_lock_write` to lock only the
+ relevant sibling tree, significantly reducing lock contention.
+
We discuss page table locks separately in the dedicated section below.
The first thing **any** of these locks achieve is to **stabilise** the VMA
@@ -230,12 +239,16 @@ These are the core fields which describe the MM the VMA belongs to and its attri
Updated under mmap read lock by
:c:func:`!task_numa_work`.
:c:member:`!vm_userfaultfd_ctx` CONFIG_USERFAULTFD Userfaultfd context wrapper object of mmap write,
- type :c:type:`!vm_userfaultfd_ctx`, VMA write.
- either of zero size if userfaultfd is
- disabled, or containing a pointer
- to an underlying
- :c:type:`!userfaultfd_ctx` object which
- describes userfaultfd metadata.
+ type :c:type:`!vm_userfaultfd_ctx`, VMA write.
+ either of zero size if userfaultfd is
+ disabled, or containing a pointer
+ to an underlying
+ :c:type:`!userfaultfd_ctx` object which
+ describes userfaultfd metadata.
+ :c:member:`!tree_idx` CONFIG_SPLIT_I_MMAP The index of the sibling i_mmap tree Written once on
+ that this VMA belongs to, set at initial map.
+ VMA creation time based on the NUMA
+ node or the smallest sibling tree.
================================= ===================== ======================================== ===============
These fields are present or not depending on whether the relevant kernel
@@ -247,12 +260,18 @@ configuration option is set.
Field Description Write lock
=================================== ========================================= ============================
:c:member:`!shared.rb` A red/black tree node used, if the mmap write, VMA write,
- mapping is file-backed, to place the VMA i_mmap write.
- in the
- :c:member:`!struct address_space->i_mmap`
- red/black interval tree.
+ mapping is file-backed, to place the VMA i_mmap write (or per-tree
+ in the i_mmap write when
+ :c:member:`!struct address_space->i_mmap` :c:macro:`!CONFIG_SPLIT_I_MMAP`
+ red/black interval tree (or one of the is set).
+ sibling trees when
+ :c:macro:`!CONFIG_SPLIT_I_MMAP`
+ is enabled).
:c:member:`!shared.rb_subtree_last` Metadata used for management of the mmap write, VMA write,
- interval tree if the VMA is file-backed. i_mmap write.
+ interval tree if the VMA is file-backed. i_mmap write (or per-tree
+ i_mmap write when
+ :c:macro:`!CONFIG_SPLIT_I_MMAP`
+ is set).
:c:member:`!anon_vma_chain` List of pointers to both forked/CoW’d mmap read, anon_vma write.
:c:type:`!anon_vma` objects and
:c:member:`!vma->anon_vma` if it is
@@ -490,6 +509,16 @@ There is also a file-system specific lock ordering comment located at the top of
Please check the current state of these comments which may have changed since
the time of writing of this document.
+.. note:: When :c:macro:`!CONFIG_SPLIT_I_MMAP` is enabled, the single
+ ``mapping->i_mmap_rwsem`` is replaced by an array of per-tree locks
+ ``mapping->i_mmap[i]->rwsem``. The lock ordering positions of
+ ``mapping->i_mmap_rwsem`` above apply to each per-tree lock
+ equivalently. VMA insert/remove operations acquire only the relevant
+ per-tree lock via :c:func:`!i_mmap_tree_lock_write`, while operations
+ that require all trees to be locked (such as
+ :c:func:`!unmap_mapping_range`) acquire all per-tree locks via
+ :c:func:`!i_mmap_lock_write` or :c:func:`!i_mmap_lock_read`.
+
------------------------------
Locking Implementation Details
------------------------------
@@ -704,11 +733,15 @@ traversed or referenced by concurrent tasks.
It is insufficient to simply hold an mmap write lock and VMA lock (which will
prevent racing faults, and rmap operations), as a file-backed mapping can be
-truncated under the :c:struct:`!struct address_space->i_mmap_rwsem` alone.
+truncated under the :c:struct:`!struct address_space->i_mmap_rwsem` alone
+(or, when :c:macro:`!CONFIG_SPLIT_I_MMAP` is enabled, under all per-tree
+``mapping->i_mmap[i]->rwsem`` locks acquired via
+:c:func:`!i_mmap_lock_write`).
As a result, no VMA which can be accessed via the reverse mapping (either
through the :c:struct:`!struct anon_vma->rb_root` or the :c:member:`!struct
-address_space->i_mmap` interval trees) can have its page tables torn down.
+address_space->i_mmap` interval trees, or the sibling trees when
+:c:macro:`!CONFIG_SPLIT_I_MMAP` is enabled) can have its page tables torn down.
The operation is typically performed via :c:func:`!free_pgtables`, which assumes
either the mmap write lock has been taken (as specified by its
@@ -729,7 +762,9 @@ cleared without page table locks (in the :c:func:`!pgd_clear`, :c:func:`!p4d_cle
.. note:: It is possible for leaf page tables to be torn down independent of
the page tables above it as is done by
:c:func:`!retract_page_tables`, which is performed under the i_mmap
- read lock, PMD, and PTE page table locks, without this level of care.
+ read lock (or all per-tree ``mapping->i_mmap[i]->rwsem`` locks in
+ read mode when :c:macro:`!CONFIG_SPLIT_I_MMAP` is enabled), PMD, and
+ PTE page table locks, without this level of care.
Page table moving
^^^^^^^^^^^^^^^^^
--
2.53.0