Re: [PATCH v3 2/2] mm/mprotect: special-case small folios when applying write permissions

From: Pedro Falcato

Date: Tue Apr 07 2026 - 04:24:34 EST


(note: had to manually adjust the To: and Cc:, it seems my neomutt doesn't
like something about your email)

On Mon, Apr 06, 2026 at 05:50:26PM -0700, Davidlohr Bueso wrote:
> On Thu, 02 Apr 2026, Pedro Falcato wrote:
>
> > @@ -334,34 +371,20 @@ static long change_pte_range(struct mmu_gather *tlb,
> >
> > nr_ptes = mprotect_folio_pte_batch(folio, pte, oldpte, max_nr_ptes, flags);
> >
> > - oldpte = modify_prot_start_ptes(vma, addr, pte, nr_ptes);
> > - ptent = pte_modify(oldpte, newprot);
> > -
> > - if (uffd_wp)
> > - ptent = pte_mkuffd_wp(ptent);
> > - else if (uffd_wp_resolve)
> > - ptent = pte_clear_uffd_wp(ptent);
> > -
> > /*
> > - * In some writable, shared mappings, we might want
> > - * to catch actual write access -- see
> > - * vma_wants_writenotify().
> > - *
> > - * In all writable, private mappings, we have to
> > - * properly handle COW.
> > - *
> > - * In both cases, we can sometimes still change PTEs
> > - * writable and avoid the write-fault handler, for
> > - * example, if a PTE is already dirty and no other
> > - * COW or special handling is required.
> > + * Optimize for the small-folio common case by
> > + * special-casing it here. Compiler constant propagation
> > + * plus copious amounts of __always_inline does wonders.
> > */
> > - if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) &&
> > - !pte_write(ptent))
> > - set_write_prot_commit_flush_ptes(vma, folio, page,
> > - addr, pte, oldpte, ptent, nr_ptes, tlb);
> > - else
> > - prot_commit_flush_ptes(vma, addr, pte, oldpte, ptent,
> > - nr_ptes, /* idx = */ 0, /* set_write = */ false, tlb);
> > + if (likely(nr_ptes == 1)) {
>
> Are there any numbers for this optimization?

Yes, see the cover letter and the testing done by both myself, Luke and David.

> While I am all for optimizing the common
> case, it seems unfair to penalize the uncommon one here. Why is nr_ptes > 1 such an
> exotic use case (specially today)?

It's less common on most architectures due to having no real incentive for
enabling mTHP, thus having no anonymous pages with order > 0 (that aren't
PMD_ORDER and thus PMD mapped). This leaves you with pagecache folios (that may
trivially be larger), but that depends on RA and the filesystem itself (e.g
btrfs does not support large folios at all). But I would be extremely
surprised if there's a regression (not in the noise) on the order > 0 case,
as it effectively does more work per iteration (and per my measurements you
easily get order >= 3 on ext4 folios, up to maybe around order-7).

> ie: How does this change affect the program in
> b9bf6c2872c ("mm: refactor MM_CP_PROT_NUMA skipping case into new function"),

That's the series I'm "fixing" to restore performance on order-0.

--
Pedro