Re: [PATCH V4 2/2] mm/khugepaged: retry with sync writeback for MADV_COLLAPSE

From: David Hildenbrand (Red Hat)

Date: Wed Jan 14 2026 - 15:14:15 EST


On 1/14/26 20:47, Garg, Shivank wrote:


On 1/11/2026 4:59 PM, David Hildenbrand (Red Hat) wrote:
On 1/10/26 19:20, Garg, Shivank wrote:


On 1/9/2026 8:16 PM, David Hildenbrand (Red Hat) wrote:
On 12/15/25 09:46, Shivank Garg wrote:


This looks a bit complicated. Can't we move that handing up, where we have most of that
information already? Or am I missing something important?

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 97d1b2824386f..c7271877c5220 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -22,6 +22,7 @@
  #include <linux/dax.h>
  #include <linux/ksm.h>
  #include <linux/pgalloc.h>
+#include <linux/backing-dev.h>
    #include <asm/tlb.h>
  #include "internal.h"
@@ -2786,7 +2787,9 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
           for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
                 int result = SCAN_FAIL;
+               bool triggered_wb = false;
  +retry:
                 if (!mmap_locked) {
                         cond_resched();
                         mmap_read_lock(mm);
@@ -2809,6 +2812,16 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
                         mmap_locked = false;
              *lock_dropped = true;
                         result = hpage_collapse_scan_file(mm, addr, file, pgoff,
                                                           cc);
+
+                       if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && !triggered_wb &&
+                           mapping_can_writeback(file->f_mapping)) {
+                               loff_t lstart = (loff_t)pgoff << PAGE_SHIFT;
+                               loff_t lend = lstart + HPAGE_PMD_SIZE - 1;
+
+                               filemap_write_and_wait_range(file->f_mapping, lstart, lend);
+                               triggered_wb = true;

                  fput(file);

+                               goto retry;
+                       }
                         fput(file);
                 } else {
                         result = hpage_collapse_scan_pmd(mm, vma, addr,



Thank you for the suggestion, this approach looks much simpler.

There are two small nits I observed:

Yeah, was a quick untested hack to see if this can be simplified :)


1. In the retry loop, it is possible that we reacquire the mmap_lock and set
    mmap_locked to true. This can cause issues later when we do:

        if (!mmap_locked)
                *lock_dropped = true;

That whole logic of having two variables that express whether locks have been taken/dropped is just absolutely confusing. Any way we can clean that up?


    because the caller would no longer see that the lock was dropped earlier.

2. We need an fput() to balance the file reference taken at line 2795.

Ah, yes, makes sense. Having a single fput() would be nicer, but that would require yet another temporary variable.


I agree, that this interaction for lock taken/droped is confusing.
However, a proper clean-up would require refactoring the locking logic across multiple functions in the collapse call-flow path.
This seems significantly more invasive and risky.

I would like to handle this refactoring but in a separate TODO for later.
Could we please proceed with these minimal changes for now?

Sure, fine with me.


Since, V4 has been in the linux-next/mm-unstable for a while, should I send a v5 or an incremental clean-up on top for this?

Just send a v4, unless Andrew tells you otherwise :)

--
Cheers

David