Re: [PATCH] mm/rmap: fix incorrect pte restoration for lazyfree folios

From: Lance Yang

Date: Thu Feb 26 2026 - 05:30:22 EST




On 2026/2/26 18:06, David Hildenbrand (Arm) wrote:
On 2/26/26 08:09, Lance Yang wrote:

On Tue, Feb 24, 2026 at 05:01:50PM +0100, David Hildenbrand (Arm) wrote:
On 2/24/26 12:43, Lorenzo Stoakes wrote:

Sorry I misread the original mail rushing through this is old... so this is less
pressing than I thought (for some reason I thought it was merged last cycle...!)
but it's a good example of how stuff can go unnoticed for a while.

In that case maybe a revert is a bit much and we just want the simplest possible
fix for backporting.

Dev volunteered to un-messify some of the stuff here. In particular, to
extend batching to all cases, not just some hand-selected ones.

Support for file folios is on the way.


But is the proposed 'just assume wrprotect' sensible? David?

In general, I think so. If PTEs were writable, they certainly have
PAE set. The write-fault handler can fully recover from that (as PAE is
set). If it's ever a performance problem (doubt), we can revisit.

I'm wondering whether we should just perform the wrprotect earlier:

diff --git a/mm/rmap.c b/mm/rmap.c
index 0f00570d1b9e..19b875ee3fad 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -2150,6 +2150,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,

/* Nuke the page table entry. */
pteval = get_and_clear_ptes(mm, address, pvmw.pte, nr_pages);
+
+ /*
+ * Our batch might include writable and read-only
+ * PTEs. When we have to restore the mapping, just
+ * assume read-only to not accidentally upgrade
+ * write permissions for PTEs that must not be
+ * writable.
+ */
+ pteval = pte_wrprotect(pteval);
+
/*
* We clear the PTE but do not flush so potentially
* a remote CPU could still be writing to the folio


Given that nobody asks for writability (pte_write()) later.

Or does someone care?

Staring at set_tlb_ubc_flush_pending()->pte_accessible() I am
not 100% sure. Could pte_wrprotect() turn a PTE inaccessible on some
architecture (write-only)? I don't think so.


We have the following options:

1) pte_wrprotect(): fake that all was read-only.

Either we do it like Dev suggests, or we do it as above early.

The downside is that any code that might later want to know "was
this possibly writable" would get that information. Well, it wouldn't
get that information reliably *today* already (and that sounds a bit shaky).

Makes sense to me :)

2) Tell batching logic to honor pte_write()

Sounds suboptimal for some cases that really don't care in the future.

3) Tell batching logic to tell us if any pte was writable: FPB_MERGE_WRITE

... then we know for sure whether any PTE was writable and we could

(a) Pass it as we did before around to all checks, like pte_accessible().

(b) Have an explicit restore PTE where we play save.


I raised to Dev in private that softdirty handling is also shaky, as we
batch over that. Meaning that we could lose or gain softdirty PTE bits in
a batch.

I guess we won't lose soft_dirty bits - only gain them (false positive):

1) get_and_clear_ptes() merges dirty bits from all PTEs via pte_mkdirty()
2) pte_mkdirty() atomically sets both _PAGE_DIRTY and _PAGE_SOFT_DIRTY on
all architectures that support soft_dirty (x86, s390, powerpc, riscv)
3) set_ptes() uses pte_advance_pfn() which keeps all flags intact

So if any PTE in the batch was dirty, all PTEs become soft_dirty after
restore.

PTEs can be softdirty without being dirty. That over-complicates the
situation.

Ah, it's even trickier then :D