[PATCH 1/4] arm64: request contpte-sized folios for exec memory
From: Usama Arif
Date: Tue Mar 10 2026 - 11:27:30 EST
exec_folio_order() was introduced [1] to request readahead of executable
file-backed pages at an arch-preferred folio order, so that the hardware
can coalesce contiguous PTEs into fewer iTLB entries (contpte).
The current implementation uses ilog2(SZ_64K >> PAGE_SHIFT), which
requests 64K folios. This is optimal for 4K base pages (where CONT_PTES
= 16, contpte size = 64K), but suboptimal for 16K and 64K base pages:
Page size | Before (order) | After (order) | contpte
----------|----------------|---------------|--------
4K | 4 (64K) | 4 (64K) | Yes (unchanged)
16K | 2 (64K) | 7 (2M) | Yes (new)
64K | 0 (64K) | 5 (2M) | Yes (new)
For 16K pages, CONT_PTES = 128 and the contpte size is 2M (order 7).
For 64K pages, CONT_PTES = 32 and the contpte size is 2M (order 5).
Use ilog2(CONT_PTES) instead, which directly evaluates to contpte-aligned
order for all page sizes.
The worst-case waste is bounded to one folio (up to 2MB - 64KB)
at the end of the file, since page_cache_ra_order() reduces the folio
order near EOF to avoid allocating past i_size.
[1] https://lore.kernel.org/all/20250430145920.3748738-6-ryan.roberts@xxxxxxx/
Signed-off-by: Usama Arif <usama.arif@xxxxxxxxx>
---
arch/arm64/include/asm/pgtable.h | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index b3e58735c49bd..a1110a33acb35 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1600,12 +1600,11 @@ static inline void update_mmu_cache_range(struct vm_fault *vmf,
#define arch_wants_old_prefaulted_pte cpu_has_hw_af
/*
- * Request exec memory is read into pagecache in at least 64K folios. This size
- * can be contpte-mapped when 4K base pages are in use (16 pages into 1 iTLB
- * entry), and HPA can coalesce it (4 pages into 1 TLB entry) when 16K base
- * pages are in use.
+ * Request exec memory is read into pagecache in contpte-sized folios. The
+ * contpte size is the number of contiguous PTEs that the hardware can coalesce
+ * into a single iTLB entry: 64K for 4K pages, 2M for 16K and 64K pages.
*/
-#define exec_folio_order() ilog2(SZ_64K >> PAGE_SHIFT)
+#define exec_folio_order() ilog2(CONT_PTES)
static inline bool pud_sect_supported(void)
{
--
2.47.3