Re: [PATCH mm-new v7 4/5] mm: khugepaged: skip lazy-free folios
From: Lance Yang
Date: Sat Feb 07 2026 - 23:10:36 EST
On 2026/2/8 05:38, David Hildenbrand (Arm) wrote:
On 2/7/26 14:51, Lance Yang wrote:
On 2026/2/7 16:34, Barry Song wrote:
On Sat, Feb 7, 2026 at 4:16 PM Vernon Yang <vernon2gm@xxxxxxxxx> wrote:
From: Vernon Yang <yanglincheng@xxxxxxxxxx>
For example, create three task: hot1 -> cold -> hot2. After all three
task are created, each allocate memory 128MB. the hot1/hot2 task
continuously access 128 MB memory, while the cold task only accesses
its memory briefly and then call madvise(MADV_FREE). However, khugepaged
still prioritizes scanning the cold task and only scans the hot2 task
after completing the scan of the cold task.
And if we collapse with a lazyfree page, that content will never be none
and the deferred shrinker cannot reclaim them.
So if the user has explicitly informed us via MADV_FREE that this memory
will be freed, it is appropriate for khugepaged to skip it only, thereby
avoiding unnecessary scan and collapse operations to reducing CPU
wastage.
Here are the performance test results:
(Throughput bigger is better, other smaller is better)
Testing on x86_64 machine:
| task hot2 | without patch | with patch | delta |
|---------------------|---------------|---------------|---------|
| total accesses time | 3.14 sec | 2.93 sec | -6.69% |
| cycles per access | 4.96 | 2.21 | -55.44% |
| Throughput | 104.38 M/sec | 111.89 M/sec | +7.19% |
| dTLB-load-misses | 284814532 | 69597236 | -75.56% |
Testing on qemu-system-x86_64 -enable-kvm:
| task hot2 | without patch | with patch | delta |
|---------------------|---------------|---------------|---------|
| total accesses time | 3.35 sec | 2.96 sec | -11.64% |
| cycles per access | 7.29 | 2.07 | -71.60% |
| Throughput | 97.67 M/sec | 110.77 M/sec | +13.41% |
| dTLB-load-misses | 241600871 | 3216108 | -98.67% |
Signed-off-by: Vernon Yang <yanglincheng@xxxxxxxxxx>
Acked-by: David Hildenbrand (arm) <david@xxxxxxxxxx>
Reviewed-by: Lance Yang <lance.yang@xxxxxxxxx>
---
include/trace/events/huge_memory.h | 1 +
mm/khugepaged.c | 13 +++++++++++++
2 files changed, 14 insertions(+)
diff --git a/include/trace/events/huge_memory.h b/include/trace/ events/huge_memory.h
index 384e29f6bef0..bcdc57eea270 100644
--- a/include/trace/events/huge_memory.h
+++ b/include/trace/events/huge_memory.h
@@ -25,6 +25,7 @@
EM( SCAN_PAGE_LRU, "page_not_in_lru") \
EM( SCAN_PAGE_LOCK, "page_locked") \
EM( SCAN_PAGE_ANON, "page_not_anon") \
+ EM( SCAN_PAGE_LAZYFREE, "page_lazyfree") \
EM( SCAN_PAGE_COMPOUND, "page_compound") \
EM( SCAN_ANY_PROCESS, "no_process_for_page") \
EM( SCAN_VMA_NULL, "vma_null") \
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 8b68ae3bc2c5..0d160e612e16 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -46,6 +46,7 @@ enum scan_result {
SCAN_PAGE_LRU,
SCAN_PAGE_LOCK,
SCAN_PAGE_ANON,
+ SCAN_PAGE_LAZYFREE,
SCAN_PAGE_COMPOUND,
SCAN_ANY_PROCESS,
SCAN_VMA_NULL,
@@ -583,6 +584,12 @@ static enum scan_result __collapse_huge_page_isolate(struct vm_area_struct *vma,
folio = page_folio(page);
VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio);
+ if (cc->is_khugepaged && !pte_dirty(pteval) &&
+ folio_test_lazyfree(folio)) {
We have two corner cases here:
Good catch!
1. Even if a lazyfree folio is dirty, if the VMA has the VM_DROPPABLE flag,
a lazyfree folio may still be dropped, even when its PTE is dirty.
Good point!
Right. When the VMA has VM_DROPPABLE, we would drop the lazyfree folio
regardless of whether it (or the PTE) is dirty in try_to_unmap_one().
So, IMHO, we could go with:
cc->is_khugepaged && folio_test_lazyfree(folio) &&
(!pte_dirty(pteval) || (vma->vm_flags & VM_DROPPABLE))
Hm. In a VM_DROPPABLE mapping all folios should be marked as lazy-free (see folio_add_new_anon_rmap()).
Ah, I missed that apparently :)
The new (collapse) folio will also be marked lazy (due to folio_add_new_anon_rmap()) free and can just get dropped any time.
So likely we should just not skip collapse for lazyfree folios in VM_DROPPABLE mappings?
if (cc->is_khugepaged && !(vma->vm_flags & VM_DROPPABLE) &&
folio_test_lazyfree(folio) && !pte_dirty(pteval)) {
...
}
Yep. That should be doing the trick. Thanks!