Re: [PATCH 0/4] arm64/mm: contpte-sized exec folios for 16K and 64K pages

From: WANG Rui

Date: Sat Mar 14 2026 - 05:53:29 EST


I only just realized your focus was on 64K normal pages, what I was
referring to here is AArch64 with 4K normal pages.

Sorry about the earlier numbers. They were a bit low precision.
RK3399 has pretty limited PMU events, and it looks like it can’t
collect events from the A53 and A72 clusters at the same time, so
I reran the measurements on the A53.

Even though the A53 backend isn’t very wide, we can still see the
impact from iTLB pressure. With 4K pages, aligning the code to PMD
size (2M) performs slightly better than 64K.

Binutils: 2.46
GCC: 15.2.1 (--enable-host-pie)

Workload: building vmlinux from Linux v7.0-rc1 with allnoconfig.
Loop: 5

Base Patchset [1] Patchset [2]
instructions 1,994,512,163,037 1,994,528,896,322 1,994,536,148,574
cpu-cycles 6,890,054,789,351 6,870,685,379,047 6,720,442,248,967
~ -0.28% ~ -2.46%
itlb-misses 579,692,117 455,848,211 43,814,795
~ -21.36% ~ -92.44%
time elapsed 1331.15 s 1325.50 s 1296.35 s
~ -0.42% ~ -2.61%

Maybe we could make exec_folio_order() choose differently folio size
depending on the configuration and conditional in some way, for example
based on the size of the code segment?

[1] https://lore.kernel.org/all/20260310145406.3073394-1-usama.arif@xxxxxxxxx
[2] https://lore.kernel.org/linux-fsdevel/20260313005211.882831-1-r@xxxxxx

Thanks,
Rui