On 16.04.25 09:01, kernel test robot wrote:
Hello,
kernel test robot noticed a 7.8% regression of vm-scalability.throughput on:
commit: 6af8cb80d3a9a6bbd521d8a7c949b4eafb7dba5d ("mm/rmap: basic MM owner tracking for large folios (!hugetlb)")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
testcase: vm-scalability
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 256 threads 2 sockets GENUINE INTEL(R) XEON(R) (Sierra Forest) with 128G memory
parameters:
runtime: 300s
size: 8T
test: anon-cow-seq
cpufreq_governor: performance
This should be the scenario with THP enabled. At first, I thought the
problem would be contention on the per-folio spinlock, but what makes me
scratch my head is the following:
13401 -16.5% 11190 proc-vmstat.thp_fault_alloc
... 3430623 -16.5% 2864565 proc-vmstat.thp_split_pmd
If we allocate less THP, performance of the benchmark will obviously be
worse with less THPs.
We allocated 2211 less THPs and had 566058 less THP PMD->PTE remappings.
566058 / 2211 = 256, which is exactly the number of threads ->
vm-scalability fork'ed child processes.
So it was in fact the benchmark that was effectively using 16.5% less THPs.
I don't see how this patch would affect the allocation of THPs in any
way (and I don't think it does).