Re: [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range

From: Hugh Dickins

Date: Mon Apr 06 2026 - 01:35:55 EST


On Mon, 6 Apr 2026, xu.xin16@xxxxxxxxxx wrote:
> >
> > But it's years since I worked on KSM or on anon_vma, so I may be confused
> > and my belief wrong. I have tried to test it, and my testcase did appear
> > to show 7.0-rc6 successfully swapping out even mremap-moved KSM folios,
> > but mm.git failing to do so.
>
> Thank you very much for providing such detailed historical context. However,
> I'm curious about your test case: how did you observe that KSM pages in mm.git
> could not be swapped out, while 7.0-rc6 worked fine?
>
> From the current implementation of mremap, before it succeeds, it always calls
> prep_move_vma() -> madvise(MADV_UNMERGEABLE) -> break_ksm(), which splits KSM pages
> into regular anonymous pages, which appears to be based on a patch you introduced
> over a decade ago, 1ff829957316(ksm: prevent mremap move poisoning). Given this,
> KSM pages should already be broken prior to the move, so they wouldn't remain as
> mergeable pages after mremap. Could there be a scenario where this breaking mechanism
> is bypassed, or am I missing a subtlety in the sequence of operations?

I'd completely forgotten that patch by now! But it's dealing with a
different issue; and note how it's intentionally leaving MADV_MERGEABLE
on the vma itself, just using MADV_UNMERGEABLE (with &dummy) as an
interface to CoW the KSM pages at that time, letting them be remerged after.

The sequence in my testcase was:

boot with mem=1G
echo 1 >/sys/kernel/mm/ksm/run
base = mmap(NULL, 3*PAGE_SIZE, PROT_READ|PROT_WRITE,
MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
madvise(base, 3*PAGE_SIZE, MADV_MERGEABLE);
madvise(base, 3*PAGE_SIZE, MADV_DONTFORK); /* in case system() used */
memset(base, 0x77, 2*PAGE_SIZE);
sleep(1); /* I think not required */
mremap(base + PAGE_SIZE, PAGE_SIZE, PAGE_SIZE,
MREMAP_MAYMOVE|MREMAP_FIXED, base + 2*PAGE_SIZE);
base2 = mmap(NULL, 512K, PROT_READ|PROT_WRITE,
MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
madvise(base2, 512K, MADV_DONTFORK); /* in case system() used */
memset(base2, 0x77, 512K);
print pages_shared pages_sharing /* 1 1 expected, 1 1 seen */
run something to mmap 1G anon, touch all, touch again, exit
print pages_shared pages_sharing /* 0 0 expected, 1 1 seen */
exit

Those base2 lines were a late addition, to get the test without mremap
showing 0 0 instead of 1 1 at the end; just as I had to apply that
pte_mkold-without-folio_mark_accessed patch to the kernel's mm/ksm.c.

Originally I was checking the testcase's /proc/pid/smaps manually
before exit; then found printing pages_shared pages_sharing easier.

Hugh