Re: [PATCH] mm/mremap: Honour writable bit in mremap pte batching

From: Pedro Falcato

Date: Tue Oct 28 2025 - 07:48:59 EST


On Tue, Oct 28, 2025 at 12:09:52PM +0530, Dev Jain wrote:
> Currently mremap folio pte batch ignores the writable bit during figuring
> out a set of similar ptes mapping the same folio. Suppose that the first
> pte of the batch is writable while the others are not - set_ptes will
> end up setting the writable bit on the other ptes, which is a violation
> of mremap semantics. Therefore, use FPB_RESPECT_WRITE to check the writable
> bit while determining the pte batch.
>

Hmm, it seems to be like we're doing the wrong thing by default here?
I must admit I haven't followed the contpte work as much as I would've
liked, but it doesn't make much sense to me why FPB_RESPECT_WRITE would
be an option you have to explicitly pass, and where folio_pte_batch (the
"simple" interface) doesn't Just Do The Right Thing for naive callers.

Auditing all callers:
- khugepaged clears a variable number of ptes
- memory.c clears a variable number of ptes
- mempolicy.c grabs folios for migrations
- mlock.c steps over nr_ptes - 1 ptes, speeding up traversal
- mremap is borked since we're remapping nr_ptes ptes
- rmap.c TTU unmaps nr_ptes ptes for a given folio

so while the vast majority of callers don't seem to care, it would make
sense that folio_pte_batch() works conservatively by default, and
folio_pte_batch_flags() would allow for further batching (or maybe
we would add a separate folio_pte_batch_clear() or
folio_pte_batch_greedy() or whatnot).

> Cc: stable@xxxxxxxxxxxxxxx #6.17
> Fixes: f822a9a81a31 ("mm: optimize mremap() by PTE batching")
> Reported-by: David Hildenbrand <david@xxxxxxxxxx>
> Debugged-by: David Hildenbrand <david@xxxxxxxxxx>
> Signed-off-by: Dev Jain <dev.jain@xxxxxxx>

But the solution itself looks okay to me. so, fwiw:

Acked-by: Pedro Falcato <pfalcato@xxxxxxx>

--
Pedro