Re: [RFC PATCH 1/7] mm/mremap: introduce more mergeable mremap via MREMAP_RELOCATE_ANON

From: David Hildenbrand
Date: Tue Apr 01 2025 - 15:33:26 EST


On 31.03.25 16:50, Lorenzo Stoakes wrote:
On Sun, Mar 23, 2025 at 01:49:07PM +0100, David Hildenbrand wrote:

c) In -next, there is now be the option to use folio lock +
folio_maybe_mapped_shared() == false. But it doesn't tell you into how many
VMAs a large folio is mapped into.

In the following case:

[ folio ]
[ VMA#1 ] [ VMA#2 ]

c) would not tell you if you are fine modifying the folio when moving VMA#2.

Right, I feel like prior checks made should assert this is not the case,
however? But mapcount check should be a last ditch assurance?

Something nice might be hiding in c) that might be able to handle a single
folio being covered by multiple vmas.

I was thinking about the following:

[ folio0 ]
[ VMA#0 ]

Then we do a partial (old-school) mremap()

[ folio0 ] [ folio0 ]
[ VMA#1 ] [ VMA#2 ]

To then extend VMA#1 and fault in pages

[ folio0 ][ folio1 ] [ folio0 ]
[ VMA#1 ] [ VMA#2 ]

If that is possible (did not try!, maybe something prevents us from
extending VMA#1) mremap(MREMAP_RELOCATE_ANON) of VMA#1 / VMA#2 cannot work.

We'd have to detect that scenario (partial mremap). You might be doing that
with the anon-vma magic, something different might be: Assume we flag large
folios if they were partially mremapped in any process.

Do we have spare folio flags? :)) I always lose track of the situation with this
and Matthew's levels of tolerance for it :P

For large folios yes (currently stored in page[1].flags)!



Then (with folio lock only)

1) folio_maybe_mapped_shared() == false: mapped into single process
2) folio_maybe_partially_mremaped() == false: not scattered in virtual
address space

It would be sufficient to check if the folio fully falls into the memap()
range to decide if we can adjust the folio index etc.

We *might* be able to use that in the COW-reuse path for large folios to
perform a folio_move_anon_rmap(), which we currently only perform for small
folios / PMD-mapped folios (single mapping). Not sure yet if actually
multiple VMAs are involved.

Interesting... this is the wp_can_reuse_anon_folio() stuff? I'll have a look
into that!

Yes. For small folios / PMD-mapped THPs it's easy. I readded it in

Author: David Hildenbrand <david@xxxxxxxxxx>
Date: Mon May 9 18:20:43 2022 -0700

mm/rmap: use page_move_anon_rmap() when reusing a mapped PageAnon() page exclusively

We'd need something along the for PTE-mapped large folios like (todo):

diff --git a/mm/memory.c b/mm/memory.c
index a838c8c44bfdc..df18fcfe29aab 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3793,6 +3793,15 @@ static bool __wp_can_reuse_large_anon_folio(struct folio *folio,
exclusive = true;
unlock:
folio_unlock_large_mapcount(folio);
+
+ if (exclusive && (folio->mapping == vma->anon_vma | PAGE_MAPPING_ANON) &&
+ todo_check_folio_in_vma(folio, vma) &&
+ folio_trylock(folio)) {
+ folio_move_anon_rmap(folio, vma);
+ folio_unlock(folio);
+ }
+out:
return exclusive;
}

Whereby a "partially mremapped" indication could come in handy.

Note that I never learned how important that optimization actually is in practice, so
I didn't care too much for now for PTE-mapped large folios.


I'm concerned about partial cases moreso though, e.g.:

mremap this
<----------->
[ folio0 ]
[ VMA#0 ]

Right, there is no easy way to not split I think. We should just try to split and if it fails, too bad.


I mean, I'm leaning more towards just breaking up the folio, especialy if we
consider a case like a biiig range:

mremap this
<--------------------------------------------------->
[ folio0 ][ folio1 ][ folio2 ][ folio3 ][ folio4 ][ folio5 ] (say order-9 each)
[ VMA#0 ]

Then at this point, refusing to do the whole thing seems maybe a bad idea, at
which point splitting the folios for folio0, 5 might be sensible.

I guess a user is saying 'please, I really care about merging' so might well be
willing to tolerate losing some of the huge page benefits, at least at the edges
here.

Yes. In uffdio_move we simply split all PTE-mapped large folios, because this way the
folio_move_anon_rmap() is way easier to handle (and we could always try optimizing it later
if really required). Once we have khugeapged with mthp support in place, it might just fix
it up for us later.

--
Cheers,

David / dhildenb