[PATCH v7 4/6] ksm: add pgoff into ksm_rmap_item
From: xu.xin16
Date: Sat May 30 2026 - 05:08:27 EST
From: xu xin <xu.xin16@xxxxxxxxxx>
The reason for adding pgoff to ksm_rmap_item has been discussed in previous
mailing list threads [1][2]. The main purpose is to allow the KSM reverse mapping
to obtain the original page's linear page index, so that during anon_vma_tree
travering, it can conditionally locate the VMAs and avoid scanning the entire
address space [0, ULONG_MAX].
To minimize the size impact of adding pgoff to ksm_rmap_item as much as
possible, a trick that David suggested is to use a UNION that groups the members
related to the unstable tree together with the newly added linear page index. The
members that valids only when in unstable tree include oldchecksum and age information.
However, the function should_skip_rmap_item() in the smart scanning needs slight
modification, since this function still uses the age information even when the
rmap_item is in a stable state (the page is not KSM), a situation that occurs
during COW faults. After using union, the size is still 64 byte without increasing.
We keep the same way to store the pgoff as rmap->anon_vma which is set when the page
is merged and become a KsmPage at try_to_merge_with_ksm_page(), and reset at
remove_rmap_item_from_tree() and remove_node_from_stable_tree() and reset when break_cow.
To be specially clarified, the reason for resetting pgoff at break_cow() is:
- When a page successfully becomes a KSM page (i.e., after stable_tree_append()
sets STABLE_FLAG), both anon_vma and vm_pgoff are stored and remain valid.
- However, during the merging process there are several failure paths where a
page that was temporarily treated as a KSM page must be reverted back to an
anonymous page. Examples include:
* The second call to try_to_merge_with_ksm_page() fails in
try_to_merge_two_pages().
* stable_tree_insert() fails in cmp_and_merge_page().
In such cases, break_cow() is invoked to break the COW mapping and discard
the KSM state.
Currently, break_cow() already contains a put_anon_vma(rmap_item->anon_vma)
to release the reference taken during the aborted merge. Because 'pgoff' is
logically paired with anon_vma (both are only meaningful when the rmap_item
is in a stable state), it must also be cleared (or reset) in break_cow() to
avoid leaving stale pgoff values that could confuse subsequent rmap walks
or scanning logic.
[1] https://lore.kernel.org/all/adTPQSb-qSSHviJN@lucifer/
[2] https://lore.kernel.org/all/202604091806051535BJWZ_FTtdIm3Snk24ei_@xxxxxxxxxx/
Suggested-by: David Hildenbrand (Arm) <david@xxxxxxxxxx>
Signed-off-by: xu xin <xu.xin16@xxxxxxxxxx>
---
mm/ksm.c | 41 ++++++++++++++++++++++++++++++++++-------
1 file changed, 34 insertions(+), 7 deletions(-)
diff --git a/mm/ksm.c b/mm/ksm.c
index 7d5b76478f0b..4761ca3fa984 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -195,22 +195,28 @@ struct ksm_stable_node {
* @node: rb node of this rmap_item in the unstable tree
* @head: pointer to stable_node heading this list in the stable tree
* @hlist: link into hlist of rmap_items hanging off that stable_node
- * @age: number of scan iterations since creation
- * @remaining_skips: how many scans to skip
+ * @age: number of scan iterations since creation (unstable node)
+ * @remaining_skips: how many scans to skip (unstable node)
+ * @pgoff: pgoff into @anon_vma where the page is mapped (stable tree)
*/
struct ksm_rmap_item {
struct ksm_rmap_item *rmap_list;
union {
- struct anon_vma *anon_vma; /* when stable */
+ struct anon_vma *anon_vma; /* for reverse mapping, when stable */
#ifdef CONFIG_NUMA
int nid; /* when node of unstable tree */
#endif
};
struct mm_struct *mm;
unsigned long address; /* + low bits used for flags below */
- unsigned int oldchecksum; /* when unstable */
- rmap_age_t age;
- rmap_age_t remaining_skips;
+ union {
+ struct {
+ unsigned int oldchecksum;
+ rmap_age_t age;
+ rmap_age_t remaining_skips;
+ }; /* when unstable */
+ unsigned long pgoff; /* for reverse mapping, when stable */
+ };
union {
struct rb_node node; /* when node of unstable tree */
struct { /* when listed from stable tree */
@@ -776,6 +782,10 @@ static struct vm_area_struct *find_mergeable_vma(struct mm_struct *mm,
return vma;
}
+/*
+ * break_cow: actively break the write-protect of the VMA. This is called when
+ * rmap_item has not yet become stable, but page has been merged.
+ */
static void break_cow(struct ksm_rmap_item *rmap_item)
{
struct mm_struct *mm = rmap_item->mm;
@@ -787,6 +797,8 @@ static void break_cow(struct ksm_rmap_item *rmap_item)
* to undo, we also need to drop a reference to the anon_vma.
*/
put_anon_vma(rmap_item->anon_vma);
+ /* Reset pgoff that might overlay age-related information. (still unstable) */
+ rmap_item->pgoff = 0;
mmap_read_lock(mm);
vma = find_mergeable_vma(mm, addr);
@@ -899,6 +911,8 @@ static void remove_node_from_stable_tree(struct ksm_stable_node *stable_node)
VM_BUG_ON(stable_node->rmap_hlist_len <= 0);
stable_node->rmap_hlist_len--;
put_anon_vma(rmap_item->anon_vma);
+ /* Reset pgoff that might overlay age-related information. */
+ rmap_item->pgoff = 0;
rmap_item->address &= PAGE_MASK;
cond_resched();
}
@@ -1052,6 +1066,8 @@ static void remove_rmap_item_from_tree(struct ksm_rmap_item *rmap_item)
stable_node->rmap_hlist_len--;
put_anon_vma(rmap_item->anon_vma);
+ /* Reset pgoff that might overlay age-related information. */
+ rmap_item->pgoff = 0;
rmap_item->head = NULL;
rmap_item->address &= PAGE_MASK;
@@ -1598,8 +1614,15 @@ static int try_to_merge_with_ksm_page(struct ksm_rmap_item *rmap_item,
/* Unstable nid is in union with stable anon_vma: remove first */
remove_rmap_item_from_tree(rmap_item);
- /* Must get reference to anon_vma while still holding mmap_lock */
+ /*
+ * Must get reference to anon_vma while still holding mmap_lock,
+ * We set these two members of stable node here instead of
+ * stable_tree_append(), maybe because we don't want to hold
+ * mmap_read_lock again. Here mmap_read_lock is already held to
+ * find_mergeable_vma before merging.
+ */
rmap_item->anon_vma = vma->anon_vma;
+ rmap_item->pgoff = linear_page_index(vma, rmap_item->address);
get_anon_vma(vma->anon_vma);
out:
mmap_read_unlock(mm);
@@ -2458,6 +2481,10 @@ static bool should_skip_rmap_item(struct folio *folio,
if (folio_test_ksm(folio))
return false;
+ /* There is no age information in stable-tree nodes. */
+ if (rmap_item->address & STABLE_FLAG)
+ return false;
+
age = rmap_item->age;
if (age != U8_MAX)
rmap_item->age++;
--
2.25.1