Re: [PATCH V5 0/2] mm/khugepaged: fix dirty page handling for MADV_COLLAPSE

From: Andrew Morton

Date: Sun Jan 18 2026 - 15:22:30 EST


On Sun, 18 Jan 2026 19:09:38 +0000 Shivank Garg <shivankg@xxxxxxx> wrote:

> MADV_COLLAPSE on file-backed mappings fails with -EINVAL when TEXT pages
> are dirty. This affects scenarios like package/container updates or
> executing binaries immediately after writing them, etc.
>
> The issue is that collapse_file() triggers async writeback and returns
> SCAN_FAIL (maps to -EINVAL), expecting khugepaged to revisit later. But
> MADV_COLLAPSE is synchronous and userspace expects immediate success or
> a clear retry signal.
>
> Reproduction:
> - Compile or copy 2MB-aligned executable to XFS/ext4 FS
> - Call MADV_COLLAPSE on .text section
> - First call fails with -EINVAL (text pages dirty from copy)
> - Second call succeeds (async writeback completed)
>
> Issue Report:
> https://lore.kernel.org/all/4e26fe5e-7374-467c-a333-9dd48f85d7cc@xxxxxxx

Updated, thanks.

Please tolerate a little whining about the timeliess here. We're at
-rc6, v4 was added to mm.git over a month ago, had quite a lot of
review, this is very close to being moved into the mm-stable branch and now
we get v5. Argh.

> V5:
> - In patch 2/2, Simplify dirty writeback retry logic (David)

Are you sure this is the only change? It looks like a lot for a
simplification and I'm wondering if we should retain the v4 series and
defer a simplification for separate consideration during the next
cycle.

Below is how this updated altered mm.git. Could reviewers please check
this fairly soon?


--- a/mm/khugepaged.c~b
+++ a/mm/khugepaged.c
@@ -2788,11 +2788,11 @@ int madvise_collapse(struct vm_area_stru
hend = end & HPAGE_PMD_MASK;

for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
- bool retried = false;
int result = SCAN_FAIL;
+ bool triggered_wb = false;

- if (!mmap_locked) {
retry:
+ if (!mmap_locked) {
cond_resched();
mmap_read_lock(mm);
mmap_locked = true;
@@ -2812,52 +2812,27 @@ retry:

mmap_read_unlock(mm);
mmap_locked = false;
+ *lock_dropped = true;
result = hpage_collapse_scan_file(mm, addr, file, pgoff,
cc);
- fput(file);
- } else {
- result = hpage_collapse_scan_pmd(mm, vma, addr,
- &mmap_locked, cc);
- }
- if (!mmap_locked)
- *lock_dropped = true;
-
- /*
- * If the file-backed VMA has dirty pages, the scan triggers
- * async writeback and returns SCAN_PAGE_DIRTY_OR_WRITEBACK.
- * Since MADV_COLLAPSE is sync, we force sync writeback and
- * retry once.
- */
- if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && !retried) {
- /*
- * File scan drops the lock. We must re-acquire it to
- * safely inspect the VMA and hold the file reference.
- */
- if (!mmap_locked) {
- cond_resched();
- mmap_read_lock(mm);
- mmap_locked = true;
- result = hugepage_vma_revalidate(mm, addr, false, &vma, cc);
- if (result != SCAN_SUCCEED)
- goto handle_result;
- }

- if (!vma_is_anonymous(vma) && vma->vm_file &&
- mapping_can_writeback(vma->vm_file->f_mapping)) {
- struct file *file = get_file(vma->vm_file);
- pgoff_t pgoff = linear_page_index(vma, addr);
+ if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && !triggered_wb &&
+ mapping_can_writeback(file->f_mapping)) {
loff_t lstart = (loff_t)pgoff << PAGE_SHIFT;
loff_t lend = lstart + HPAGE_PMD_SIZE - 1;

- mmap_read_unlock(mm);
- mmap_locked = false;
- *lock_dropped = true;
filemap_write_and_wait_range(file->f_mapping, lstart, lend);
+ triggered_wb = true;
fput(file);
- retried = true;
goto retry;
}
+ fput(file);
+ } else {
+ result = hpage_collapse_scan_pmd(mm, vma, addr,
+ &mmap_locked, cc);
}
+ if (!mmap_locked)
+ *lock_dropped = true;

handle_result:
switch (result) {
_