Re: madvise(MADV_COLLAPSE) fails with EINVAL on dirty file-backed text pages

From: Lorenzo Stoakes

Date: Fri Nov 07 2025 - 05:11:57 EST


On Fri, Nov 07, 2025 at 10:12:02AM +0100, David Hildenbrand (Red Hat) wrote:
>
> >
> > 5. Yes, I'm calling madvise(MADV_COLLAPSE) on the text portion of the executable, using the address
> > range obtained from /proc/self/maps. IIUC, this should benefit applications by reducing ITLB pressure.
> >
> > I agree with the suggestions to either Return EAGAIN instead of EINVAL or At minimum, document the
> > EINVAL return for dirty pages. I'm happy to work on a patch.
>
> Of course, we could detect that we are in MADV_COLLAPSE and simply writeback ourselves. After all,
> user space asked for a collapse, and it's not khugepaged that will simple revisit it later.
>
> I did something similar in
>
> commit ab73b29efd36f8916c6cc9954e912c4723c9a1b0
> Author: David Hildenbrand <david@xxxxxxxxxx>
> Date: Fri May 16 14:39:46 2025 +0200
>
> s390/uv: Improve splitting of large folios that cannot be split while dirty
> Currently, starting a PV VM on an iomap-based filesystem with large
> folio support, such as XFS, will not work. We'll be stuck in
> unpack_one()->gmap_make_secure(), because we can't seem to make progress
> splitting the large folio.
>
> Where I effectively use filemap_write_and_wait_range().
>
> It could be used early to writeback the whole range to collapse once, possibly.

I agree, let's just do a sync flush unconditionally and fix this that way.

This is simpler than I thought, the key bit of information is that we have
freshly written the executable so it sits in the page cache but dirty.

Thanks, Lorenzo