Re: madvise(MADV_COLLAPSE) fails with EINVAL on dirty file-backed text pages

From: Lorenzo Stoakes

Date: Fri Nov 07 2025 - 07:51:25 EST


On Fri, Nov 07, 2025 at 10:09:41AM +0000, Lorenzo Stoakes wrote:
> On Thu, Nov 06, 2025 at 10:05:41PM +0100, David Hildenbrand (Red Hat) wrote:
> > /*
> > * The lock of new_folio is still held, we will be blocked in
> > * the page fault path, which prevents the pte entries from
> > * being set again. So even though the old empty PTE page may be
> > * concurrently freed and a new PTE page is filled into the pmd
> > * entry, it is still empty and can be removed.
> > *
> > * So here we only need to recheck if the state of pmd entry
> > * still meets our requirements, rather than checking pmd_same()
> > * like elsewhere.
> > */
> > if (check_pmd_state(pmd) != SCAN_SUCCEED)
> > goto drop_pml;
> > ptl = pte_lockptr(mm, pmd);
> > if (ptl != pml)
> > spin_lock_nested(ptl, SINGLE_DEPTH_NESTING);
> >
> > /*
> > * Huge page lock is still held, so normally the page table
> > * must remain empty; and we have already skipped anon_vma
> > * and userfaultfd_wp() vmas. But since the mmap_lock is not
> > * held, it is still possible for a racing userfaultfd_ioctl()
> > * to have inserted ptes or markers. Now that we hold ptlock,
> > * repeating the anon_vma check protects from one category,
> > * and repeating the userfaultfd_wp() check from another.
> > */
> > if (likely(!vma->anon_vma && !userfaultfd_wp(vma))) {
> > pgt_pmd = pmdp_collapse_flush(vma, addr, pmd);
> > pmdp_get_lockless_sync();
> > success = true;
> > }
> >
> > Given !vma->anon_vma, we cannot have anon folios in there.
> >
> > Given !userfaultfd_wp(vma), we cannot have uffd-wp markers in there.
>
> Right.
>
> >
> > Given that all folios in the range we are collapsing where unmapped, we cannot have
> > them mapped there.
> >
> > So the conclusion is that the page table must be empty and can be removed.
> >
> >
> > Could guard markers be in there?
>
> Right now guard markers only exist if vma->anon_vma is set, including the
> file-backed case.
>
> But for file-backed guard regions after my VMA sticky series this won't be the
> case any more :)
>
> So I had better go change that...
>
> I hate that we have open-coded stuff all over the place that makes assumptions
> like this.
>
> This also ignores any other marker types. How I hate the uffd wp implementation.

OK I audited all vma->anon_vma uses and _this_ is literally the only place that
is affected :)

Thanks for mentioning :P have written a self test to repro and fix will land in
v3 of the sticky VMA series.

Cheers, Lorenzo