Re: [PATCH mm-unstable v3 6/6] mm/mglru: rework workingset protection

From: kernel test robot
Date: Mon Dec 23 2024 - 03:46:07 EST




Hello,

kernel test robot noticed a 5.7% regression of will-it-scale.per_process_ops on:


commit: 3b7734aa8458b62ecbfd785ca7918e831565006e ("[PATCH mm-unstable v3 6/6] mm/mglru: rework workingset protection")
url: https://github.com/intel-lab-lkp/linux/commits/Yu-Zhao/mm-mglru-clean-up-workingset/20241208-061714
base: v6.13-rc1
patch link: https://lore.kernel.org/all/20241207221522.2250311-7-yuzhao@xxxxxxxxxx/
patch subject: [PATCH mm-unstable v3 6/6] mm/mglru: rework workingset protection

testcase: will-it-scale
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 104 threads 2 sockets (Skylake) with 192G memory
parameters:

nr_task: 100%
mode: process
test: pread2
cpufreq_governor: performance




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
| Closes: https://lore.kernel.org/oe-lkp/202412231601.f1eb8f84-lkp@xxxxxxxxx


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20241223/202412231601.f1eb8f84-lkp@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-9.4/process/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/pread2/will-it-scale

commit:
4a202aca7c ("mm/mglru: rework refault detection")
3b7734aa84 ("mm/mglru: rework workingset protection")

4a202aca7c7d9f99 3b7734aa8458b62ecbfd785ca79
---------------- ---------------------------
%stddev %change %stddev
\ | \
1.03 ± 3% -0.1 0.92 ± 5% mpstat.cpu.all.usr%
0.29 ± 14% +20.8% 0.35 ± 7% perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
1.02 ± 21% +50.7% 1.54 ± 23% perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
0.01 ± 50% -66.9% 0.00 ± 82% perf-stat.i.major-faults
0.01 ± 50% -73.6% 0.00 ±112% perf-stat.ps.major-faults
335982 -60.7% 132060 ± 15% proc-vmstat.nr_active_anon
335982 -60.7% 132060 ± 15% proc-vmstat.nr_zone_active_anon
1343709 -60.7% 528460 ± 15% meminfo.Active
1343709 -60.7% 528460 ± 15% meminfo.Active(anon)
259.96 +3.2e+05% 821511 ± 11% meminfo.Inactive
1401961 -5.7% 1321692 ± 2% will-it-scale.104.processes
13479 -5.7% 12708 ± 2% will-it-scale.per_process_ops
1401961 -5.7% 1321692 ± 2% will-it-scale.workload
138691 ± 43% -75.8% 33574 ± 55% numa-vmstat.node0.nr_active_anon
138691 ± 43% -75.8% 33574 ± 55% numa-vmstat.node0.nr_zone_active_anon
197311 ± 30% -50.1% 98494 ± 18% numa-vmstat.node1.nr_active_anon
197311 ± 30% -50.1% 98494 ± 18% numa-vmstat.node1.nr_zone_active_anon
554600 ± 43% -75.8% 134360 ± 55% numa-meminfo.node0.Active
554600 ± 43% -75.8% 134360 ± 55% numa-meminfo.node0.Active(anon)
173.31 ± 70% +1.4e+05% 247821 ± 50% numa-meminfo.node0.Inactive
789291 ± 30% -50.1% 394029 ± 18% numa-meminfo.node1.Active
789291 ± 30% -50.1% 394029 ± 18% numa-meminfo.node1.Active(anon)
86.66 ±141% +6.6e+05% 573998 ± 27% numa-meminfo.node1.Inactive
38.95 -0.9 38.09 perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.folio_wait_bit_common.shmem_get_folio_gfp.shmem_file_read_iter.vfs_read
38.83 -0.9 37.97 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.folio_wait_bit_common.shmem_get_folio_gfp.shmem_file_read_iter
39.70 -0.8 38.86 perf-profile.calltrace.cycles-pp.folio_wait_bit_common.shmem_get_folio_gfp.shmem_file_read_iter.vfs_read.__x64_sys_pread64
41.03 -0.8 40.26 perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_file_read_iter.vfs_read.__x64_sys_pread64.do_syscall_64
0.91 +0.0 0.95 perf-profile.calltrace.cycles-pp.filemap_get_entry.shmem_get_folio_gfp.shmem_file_read_iter.vfs_read.__x64_sys_pread64
53.14 +0.5 53.66 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_wake_bit.shmem_file_read_iter.vfs_read
53.24 +0.5 53.76 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_wake_bit.shmem_file_read_iter.vfs_read.__x64_sys_pread64
53.84 +0.5 54.38 perf-profile.calltrace.cycles-pp.folio_wake_bit.shmem_file_read_iter.vfs_read.__x64_sys_pread64.do_syscall_64
38.96 -0.9 38.09 perf-profile.children.cycles-pp._raw_spin_lock_irq
39.71 -0.8 38.87 perf-profile.children.cycles-pp.folio_wait_bit_common
41.04 -0.8 40.26 perf-profile.children.cycles-pp.shmem_get_folio_gfp
92.00 -0.3 91.67 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
0.22 -0.0 0.18 ± 3% perf-profile.children.cycles-pp._copy_to_iter
0.22 ± 2% -0.0 0.19 ± 2% perf-profile.children.cycles-pp.copy_page_to_iter
0.20 ± 2% -0.0 0.16 ± 4% perf-profile.children.cycles-pp.rep_movs_alternative
0.91 +0.0 0.96 perf-profile.children.cycles-pp.filemap_get_entry
0.00 +0.3 0.35 perf-profile.children.cycles-pp.folio_mark_accessed
53.27 +0.5 53.80 perf-profile.children.cycles-pp._raw_spin_lock_irqsave
53.86 +0.5 54.40 perf-profile.children.cycles-pp.folio_wake_bit
92.00 -0.3 91.67 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
0.19 -0.0 0.16 ± 3% perf-profile.self.cycles-pp.rep_movs_alternative
0.41 +0.0 0.44 perf-profile.self.cycles-pp.shmem_get_folio_gfp
0.37 ± 2% +0.0 0.40 perf-profile.self.cycles-pp.folio_wait_bit_common
0.90 +0.0 0.94 perf-profile.self.cycles-pp.filemap_get_entry
0.61 +0.1 0.68 perf-profile.self.cycles-pp.shmem_file_read_iter
0.00 +0.3 0.34 ± 2% perf-profile.self.cycles-pp.folio_mark_accessed




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki