Re: [PATCH v2 0/2] mm/mprotect: micro-optimization work
From: David Hildenbrand (Arm)
Date: Wed Apr 01 2026 - 04:36:27 EST
On 3/30/26 22:06, Andrew Morton wrote:
> On Mon, 30 Mar 2026 15:55:51 -0400 Luke Yang <luyang@xxxxxxxxxx> wrote:
>
>> Thanks for working on this. I just wanted to share that we've created a
>> test kernel with your patches and tested on the following CPUs:
>>
>> --- aarch64 ---
>> Ampere Altra
>> Ampere Altra Max
>>
>> --- x86_64 ---
>> AMD EPYC 7713
>> AMD EPYC 7351
>> AMD EPYC 7542
>> AMD EPYC 7573X
>> AMD EPYC 7702
>> AMD EPYC 9754
>> Intel Xeon Gold 6126
>> Into Xeon Gold 6330
>> Intel Xeon Gold 6530
>> Intel Xeon Platinum 8351N
>> Intel Core i7-6820HQ
>>
>> --- ppc64le ---
>> IBM Power 10
>>
>> On average, we see improvements ranging from a minimum of 5% to a
>> maximum of 55%, with most improvements showing around a 25% speed up in
>> the libmicro/mprot_tw4m micro benchmark.
>
> Thanks, that's nice. I've added some of the above into the changelog
> and I took the liberty of adding your Tested-by: to both patches.
>
> fyi, regarding [2/2]: it's unclear to me whether the discussion with
> David will result in any alterations. If there's something I need to
> it always helps to lmk ;)
I think we want to get a better understanding of which exact __always_inline
is really helpful in patch #2, and where to apply the nr_ptes==1 forced
optimization.
I updated my microbenchmark I use for fork+unmap etc to measure
mprotect as well
https://gitlab.com/davidhildenbrand/scratchspace/-/raw/main/pte-mapped-folio-benchmarks.c?ref_type=heads
Running some simple tests with order-0 on 1 GiB of memory:
Upstream Linus:
./pte-mapped-folio-benchmarks 0 write-protect 5
0.005779
...
./pte-mapped-folio-benchmarks 0 write-unprotect 5
0.009113
...
With Pedro's patch #2:
$ ./pte-mapped-folio-benchmarks 0 write-protect 5
0.003941
...
$ ./pte-mapped-folio-benchmarks 0 write-unprotect 5
0.006163
...
With the patch below:
$ ./pte-mapped-folio-benchmarks 0 write-protect 5
0.003364
$ ./pte-mapped-folio-benchmarks 0 write-unprotect 5
0.005729
So patch #2 might be improved. And the forced inlining of
mprotect_folio_pte_batch() should likely not go into the same patch.
---