Re: [RFC PATCH V1 5/6] sched/numa: Allow recently accessed VMAs to be scanned

From: Oliver Sang
Date: Mon Sep 11 2023 - 22:53:20 EST


hi, Raghu,

On Mon, Sep 11, 2023 at 04:55:56PM +0530, Raghavendra K T wrote:
> On 9/10/2023 8:59 PM, kernel test robot wrote:
> > 341.49 -4.1% 327.42 ± 2% autonuma-benchmark.numa01.seconds
> > 186.67 ± 6% -27.1% 136.12 ± 7% autonuma-benchmark.numa01_THREAD_ALLOC.seconds
> > 21.17 ± 7% -33.6% 14.05 autonuma-benchmark.numa02.seconds
> > 2200 ± 2% -13.0% 1913 ± 3% autonuma-benchmark.time.elapsed_time
>
> Hello Oliver/Kernel test robot,
> Thank yo alot for testing.
>
> Results are impressive. Can I take this result as
> positive for whole series too?

FYI. we applied your patch set like below:

68cfe9439a1ba (linux-review/Raghavendra-K-T/sched-numa-Move-up-the-access-pid-reset-logic/20230829-141007) sched/numa: Allow scanning of shared VMAs
af46f3c9ca2d1 sched/numa: Allow recently accessed VMAs to be scanned
167773d1ddb5f sched/numa: Increase tasks' access history
fc769221b2306 sched/numa: Remove unconditional scan logic using mm numa_scan_seq
1ef5cbb92bdb3 sched/numa: Add disjoint vma unconditional scan logic
2a806eab1c2e1 sched/numa: Move up the access pid reset logic
2f88c8e802c8b (tip/sched/core) sched/eevdf/doc: Modify the documented knob to base_slice_ns as well

in our tests, we also tested the 68cfe9439a1ba, if comparing it to af46f3c9ca2d1:

=========================================================================================
compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase:
gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp6/numa01_THREAD_ALLOC/autonuma-benchmark

commit:
af46f3c9ca ("sched/numa: Allow recently accessed VMAs to be scanned")
68cfe9439a ("sched/numa: Allow scanning of shared VMA")

af46f3c9ca2d1648 68cfe9439a1baa642e05883fa64
---------------- ---------------------------
%stddev %change %stddev
\ | \
327.42 ± 2% -1.1% 323.83 ± 3% autonuma-benchmark.numa01.seconds
136.12 ± 7% -25.1% 101.90 ± 2% autonuma-benchmark.numa01_THREAD_ALLOC.seconds
14.05 +1.5% 14.26 autonuma-benchmark.numa02.seconds
1913 ± 3% -7.9% 1763 ± 2% autonuma-benchmark.time.elapsed_time


below is the full comparison FYI.


af46f3c9ca2d1648 68cfe9439a1baa642e05883fa64
---------------- ---------------------------
%stddev %change %stddev
\ | \
36437 ± 9% +20.4% 43867 ± 10% meminfo.Mapped
0.02 ± 17% +0.0 0.03 ± 8% mpstat.cpu.all.iowait%
71.00 ± 2% +6.3% 75.50 turbostat.PkgTmp
3956991 ± 7% -15.0% 3361998 ± 5% sched_debug.cfs_rq:/.avg_vruntime.stddev
3956991 ± 7% -15.0% 3361997 ± 5% sched_debug.cfs_rq:/.min_vruntime.stddev
-30.18 +27.8% -38.56 sched_debug.cpu.nr_uninterruptible.min
1913 ± 3% -7.9% 1763 ± 2% time.elapsed_time
1913 ± 3% -7.9% 1763 ± 2% time.elapsed_time.max
3194802 -2.4% 3117907 time.minor_page_faults
210725 ± 3% -8.7% 192483 ± 3% time.user_time
327.42 ± 2% -1.1% 323.83 ± 3% autonuma-benchmark.numa01.seconds
136.12 ± 7% -25.1% 101.90 ± 2% autonuma-benchmark.numa01_THREAD_ALLOC.seconds
14.05 +1.5% 14.26 autonuma-benchmark.numa02.seconds
1913 ± 3% -7.9% 1763 ± 2% autonuma-benchmark.time.elapsed_time
1913 ± 3% -7.9% 1763 ± 2% autonuma-benchmark.time.elapsed_time.max
3194802 -2.4% 3117907 autonuma-benchmark.time.minor_page_faults
210725 ± 3% -8.7% 192483 ± 3% autonuma-benchmark.time.user_time
1.33 ± 91% -88.0% 0.16 ± 14% perf-sched.sch_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
0.09 ±194% +3204.2% 3.03 ± 66% perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
3.72 ± 9% -24.8% 2.80 ± 21% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select
41.00 ±147% +2060.2% 885.67 ±105% perf-sched.wait_and_delay.count.io_schedule.migration_entry_wait_on_locked.__handle_mm_fault.handle_mm_fault
18.61 ± 18% -28.5% 13.30 ± 21% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
7.84 ±100% +354.6% 35.66 ± 89% perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
9285 ± 8% +20.1% 11152 ± 10% proc-vmstat.nr_mapped
6984234 -4.0% 6706018 proc-vmstat.numa_hit
5686795 ± 2% -5.2% 5390176 proc-vmstat.numa_local
10616944 ± 4% +15.7% 12279801 ± 3% proc-vmstat.numa_pages_migrated
7827793 ± 2% -5.2% 7421440 ± 2% proc-vmstat.pgfault
10616944 ± 4% +15.7% 12279801 ± 3% proc-vmstat.pgmigrate_success
334391 ± 2% -8.6% 305628 ± 2% proc-vmstat.pgreuse
20686 ± 4% +15.7% 23939 ± 3% proc-vmstat.thp_migration_success
14322816 ± 3% -8.2% 13147392 ± 2% proc-vmstat.unevictable_pgs_scanned
1339 ± 3% +8.6% 1454 ± 2% perf-stat.i.MPKI
1.716e+08 +2.8% 1.764e+08 perf-stat.i.branch-instructions
1.03 +0.1 1.11 ± 3% perf-stat.i.branch-miss-rate%
1707146 ± 2% +9.5% 1869960 ± 4% perf-stat.i.branch-misses
7.513e+08 ± 4% +11.1% 8.351e+08 ± 3% perf-stat.i.cache-misses
9.542e+08 ± 3% +8.9% 1.04e+09 ± 3% perf-stat.i.cache-references
534.57 -1.5% 526.34 perf-stat.i.cpi
158.57 +1.6% 161.11 perf-stat.i.cpu-migrations
582.98 ± 6% -11.4% 516.40 ± 3% perf-stat.i.cycles-between-cache-misses
2.263e+08 +2.2% 2.312e+08 perf-stat.i.dTLB-loads
8.538e+08 +2.5% 8.753e+08 perf-stat.i.instructions
15.98 ± 3% +8.9% 17.40 ± 3% perf-stat.i.metric.M/sec
3958 +3.0% 4075 perf-stat.i.minor-faults
37.89 ± 3% -3.6 34.28 ± 5% perf-stat.i.node-store-miss-rate%
2.585e+08 ± 4% -7.7% 2.385e+08 ± 3% perf-stat.i.node-store-misses
4.902e+08 ± 7% +21.1% 5.937e+08 ± 7% perf-stat.i.node-stores
3958 +2.9% 4075 perf-stat.i.page-faults
1116 ± 2% +6.2% 1186 ± 2% perf-stat.overall.MPKI
0.98 +0.1 1.04 ± 3% perf-stat.overall.branch-miss-rate%
448.99 -2.8% 436.60 perf-stat.overall.cpi
508.08 ± 4% -10.1% 456.56 ± 4% perf-stat.overall.cycles-between-cache-misses
0.00 +2.8% 0.00 perf-stat.overall.ipc
34.70 ± 6% -5.7 29.02 ± 7% perf-stat.overall.node-store-miss-rate%
1.698e+08 +2.8% 1.746e+08 perf-stat.ps.branch-instructions
1665855 ± 2% +9.5% 1824511 ± 3% perf-stat.ps.branch-misses
7.489e+08 ± 4% +10.9% 8.306e+08 ± 4% perf-stat.ps.cache-misses
9.447e+08 ± 3% +8.9% 1.029e+09 ± 3% perf-stat.ps.cache-references
158.05 +1.4% 160.31 perf-stat.ps.cpu-migrations
2.244e+08 +2.1% 2.292e+08 perf-stat.ps.dTLB-loads
8.46e+08 +2.5% 8.672e+08 perf-stat.ps.instructions
3906 +2.9% 4020 perf-stat.ps.minor-faults
284110 ± 5% +12.0% 318166 ± 2% perf-stat.ps.node-loads
2.584e+08 ± 3% -7.3% 2.395e+08 ± 3% perf-stat.ps.node-store-misses
4.879e+08 ± 7% +20.6% 5.883e+08 ± 7% perf-stat.ps.node-stores
3906 +2.9% 4020 perf-stat.ps.page-faults
1.622e+12 ± 2% -5.7% 1.53e+12 ± 2% perf-stat.total.instructions
6.29 ± 13% -2.2 4.11 ± 24% perf-profile.calltrace.cycles-pp.read
6.22 ± 13% -2.2 4.05 ± 24% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read
6.21 ± 13% -2.2 4.04 ± 24% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
6.04 ± 13% -2.1 3.90 ± 24% perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
6.09 ± 13% -2.1 3.96 ± 24% perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
3.68 ± 17% -1.4 2.25 ± 36% perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
3.22 ± 16% -1.4 1.79 ± 27% perf-profile.calltrace.cycles-pp.open64
3.66 ± 16% -1.4 2.24 ± 36% perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64
3.88 ± 13% -1.4 2.49 ± 20% perf-profile.calltrace.cycles-pp.seq_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
3.83 ± 13% -1.4 2.48 ± 19% perf-profile.calltrace.cycles-pp.seq_read_iter.seq_read.vfs_read.ksys_read.do_syscall_64
3.03 ± 17% -1.3 1.71 ± 26% perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
3.09 ± 17% -1.3 1.77 ± 27% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.open64
3.08 ± 17% -1.3 1.76 ± 27% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
3.04 ± 17% -1.3 1.73 ± 26% perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
2.61 ± 14% -1.0 1.60 ± 20% perf-profile.calltrace.cycles-pp.proc_single_show.seq_read_iter.seq_read.vfs_read.ksys_read
2.58 ± 13% -1.0 1.58 ± 21% perf-profile.calltrace.cycles-pp.do_task_stat.proc_single_show.seq_read_iter.seq_read.vfs_read
0.99 ± 17% -0.5 0.46 ± 75% perf-profile.calltrace.cycles-pp.__xstat64
0.97 ± 18% -0.5 0.46 ± 75% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__xstat64
0.96 ± 18% -0.5 0.46 ± 75% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__xstat64
0.95 ± 18% -0.5 0.45 ± 75% perf-profile.calltrace.cycles-pp.__do_sys_newstat.do_syscall_64.entry_SYSCALL_64_after_hwframe.__xstat64
0.92 ± 19% -0.5 0.45 ± 75% perf-profile.calltrace.cycles-pp.vfs_fstatat.__do_sys_newstat.do_syscall_64.entry_SYSCALL_64_after_hwframe.__xstat64
0.72 ± 12% -0.3 0.40 ± 71% perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
7.12 ± 13% -2.4 4.73 ± 22% perf-profile.children.cycles-pp.ksys_read
6.91 ± 12% -2.3 4.57 ± 23% perf-profile.children.cycles-pp.vfs_read
6.30 ± 13% -2.2 4.12 ± 24% perf-profile.children.cycles-pp.read
5.34 ± 12% -1.9 3.46 ± 25% perf-profile.children.cycles-pp.seq_read_iter
4.65 ± 13% -1.7 2.98 ± 31% perf-profile.children.cycles-pp.do_sys_openat2
4.67 ± 13% -1.7 3.01 ± 30% perf-profile.children.cycles-pp.__x64_sys_openat
4.43 ± 13% -1.6 2.86 ± 29% perf-profile.children.cycles-pp.do_filp_open
4.41 ± 13% -1.6 2.85 ± 29% perf-profile.children.cycles-pp.path_openat
3.23 ± 16% -1.4 1.80 ± 27% perf-profile.children.cycles-pp.open64
3.89 ± 13% -1.4 2.49 ± 20% perf-profile.children.cycles-pp.seq_read
2.61 ± 14% -1.0 1.60 ± 20% perf-profile.children.cycles-pp.proc_single_show
2.59 ± 13% -1.0 1.58 ± 21% perf-profile.children.cycles-pp.do_task_stat
1.66 ± 12% -0.7 0.96 ± 36% perf-profile.children.cycles-pp.lookup_fast
1.43 ± 16% -0.6 0.86 ± 29% perf-profile.children.cycles-pp.walk_component
1.50 ± 14% -0.5 0.96 ± 30% perf-profile.children.cycles-pp.link_path_walk
1.24 ± 10% -0.5 0.77 ± 32% perf-profile.children.cycles-pp.do_open
1.53 ± 7% -0.4 1.08 ± 19% perf-profile.children.cycles-pp.sched_setaffinity
1.02 ± 15% -0.4 0.64 ± 33% perf-profile.children.cycles-pp.__xstat64
1.10 ± 18% -0.4 0.72 ± 31% perf-profile.children.cycles-pp.__do_sys_newstat
1.09 ± 18% -0.4 0.73 ± 30% perf-profile.children.cycles-pp.path_lookupat
1.10 ± 18% -0.4 0.74 ± 29% perf-profile.children.cycles-pp.filename_lookup
1.07 ± 19% -0.4 0.72 ± 32% perf-profile.children.cycles-pp.vfs_fstatat
0.97 ± 9% -0.4 0.62 ± 34% perf-profile.children.cycles-pp.do_dentry_open
0.82 ± 19% -0.4 0.48 ± 34% perf-profile.children.cycles-pp.__d_lookup_rcu
0.94 ± 18% -0.3 0.61 ± 35% perf-profile.children.cycles-pp.vfs_statx
0.61 ± 11% -0.3 0.33 ± 32% perf-profile.children.cycles-pp.pid_revalidate
0.78 ± 14% -0.3 0.50 ± 29% perf-profile.children.cycles-pp.tlb_finish_mmu
0.64 ± 15% -0.3 0.37 ± 29% perf-profile.children.cycles-pp.getdents64
0.62 ± 16% -0.3 0.35 ± 28% perf-profile.children.cycles-pp.proc_pid_readdir
0.64 ± 15% -0.3 0.37 ± 29% perf-profile.children.cycles-pp.__x64_sys_getdents64
0.64 ± 15% -0.3 0.37 ± 29% perf-profile.children.cycles-pp.iterate_dir
0.61 ± 15% -0.3 0.35 ± 24% perf-profile.children.cycles-pp.__percpu_counter_init
0.96 ± 8% -0.3 0.71 ± 20% perf-profile.children.cycles-pp.evlist_cpu_iterator__next
1.03 ± 12% -0.2 0.78 ± 15% perf-profile.children.cycles-pp.__libc_read
0.75 ± 8% -0.2 0.53 ± 17% perf-profile.children.cycles-pp.__x64_sys_sched_setaffinity
0.39 ± 13% -0.2 0.19 ± 24% perf-profile.children.cycles-pp.__entry_text_start
0.40 ± 18% -0.2 0.22 ± 25% perf-profile.children.cycles-pp.ptrace_may_access
0.62 ± 7% -0.2 0.45 ± 17% perf-profile.children.cycles-pp.__sched_setaffinity
0.36 ± 16% -0.2 0.20 ± 25% perf-profile.children.cycles-pp.proc_fill_cache
0.57 ± 6% -0.2 0.40 ± 20% perf-profile.children.cycles-pp.__set_cpus_allowed_ptr
0.42 ± 21% -0.2 0.27 ± 38% perf-profile.children.cycles-pp.inode_permission
0.36 ± 20% -0.1 0.22 ± 25% perf-profile.children.cycles-pp._find_next_bit
0.39 ± 14% -0.1 0.25 ± 22% perf-profile.children.cycles-pp.__kmem_cache_alloc_node
0.44 ± 12% -0.1 0.30 ± 26% perf-profile.children.cycles-pp.pick_link
0.25 ± 18% -0.1 0.12 ± 19% perf-profile.children.cycles-pp.security_ptrace_access_check
0.32 ± 15% -0.1 0.19 ± 22% perf-profile.children.cycles-pp.__x64_sys_readlink
0.22 ± 13% -0.1 0.11 ± 33% perf-profile.children.cycles-pp.readlink
0.31 ± 14% -0.1 0.19 ± 22% perf-profile.children.cycles-pp.do_readlinkat
0.32 ± 11% -0.1 0.22 ± 30% perf-profile.children.cycles-pp.vfs_fstat
0.26 ± 19% -0.1 0.15 ± 26% perf-profile.children.cycles-pp.load_elf_interp
0.22 ± 17% -0.1 0.12 ± 32% perf-profile.children.cycles-pp.d_hash_and_lookup
0.21 ± 31% -0.1 0.12 ± 31% perf-profile.children.cycles-pp.may_open
0.30 ± 14% -0.1 0.21 ± 18% perf-profile.children.cycles-pp.copy_strings
0.24 ± 18% -0.1 0.14 ± 32% perf-profile.children.cycles-pp.unlink_anon_vmas
0.19 ± 19% -0.1 0.10 ± 32% perf-profile.children.cycles-pp.__kmalloc_node
0.29 ± 8% -0.1 0.21 ± 10% perf-profile.children.cycles-pp.affine_move_task
0.24 ± 21% -0.1 0.16 ± 24% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
0.22 ± 10% -0.1 0.14 ± 28% perf-profile.children.cycles-pp.mas_preallocate
0.24 ± 12% -0.1 0.16 ± 30% perf-profile.children.cycles-pp.mas_alloc_nodes
0.21 ± 14% -0.1 0.14 ± 20% perf-profile.children.cycles-pp.__d_alloc
0.10 ± 19% -0.1 0.03 ±100% perf-profile.children.cycles-pp.pid_task
0.14 ± 24% -0.1 0.06 ± 50% perf-profile.children.cycles-pp.single_open
0.20 ± 11% -0.1 0.12 ± 12% perf-profile.children.cycles-pp.cpu_stop_queue_work
0.18 ± 16% -0.1 0.11 ± 25% perf-profile.children.cycles-pp.generic_fillattr
0.14 ± 19% -0.1 0.07 ± 29% perf-profile.children.cycles-pp.apparmor_ptrace_access_check
0.14 ± 23% -0.1 0.08 ± 30% perf-profile.children.cycles-pp.native_flush_tlb_one_user
0.10 ± 10% -0.1 0.04 ± 71% perf-profile.children.cycles-pp.vfs_readlink
0.09 ± 19% -0.1 0.03 ±100% perf-profile.children.cycles-pp.aa_get_task_label
0.14 ± 25% -0.1 0.08 ± 23% perf-profile.children.cycles-pp.proc_pid_get_link
0.16 ± 21% -0.1 0.10 ± 28% perf-profile.children.cycles-pp.thread_group_cputime_adjusted
0.19 ± 15% -0.1 0.13 ± 27% perf-profile.children.cycles-pp.strnlen_user
0.18 ± 27% -0.1 0.11 ± 21% perf-profile.children.cycles-pp.wq_worker_comm
0.18 ± 13% -0.1 0.11 ± 36% perf-profile.children.cycles-pp.vfs_getattr_nosec
0.17 ± 16% -0.1 0.11 ± 24% perf-profile.children.cycles-pp.proc_pid_cmdline_read
0.12 ± 10% -0.1 0.06 ± 48% perf-profile.children.cycles-pp.terminate_walk
0.14 ± 18% -0.1 0.09 ± 27% perf-profile.children.cycles-pp.thread_group_cputime
0.13 ± 21% -0.0 0.08 ± 27% perf-profile.children.cycles-pp.get_obj_cgroup_from_current
0.14 ± 18% -0.0 0.10 ± 26% perf-profile.children.cycles-pp.get_mm_cmdline
0.14 ± 10% -0.0 0.10 ± 17% perf-profile.children.cycles-pp.wake_up_q
1.37 ± 16% -0.6 0.81 ± 23% perf-profile.self.cycles-pp.do_task_stat
0.80 ± 18% -0.3 0.46 ± 34% perf-profile.self.cycles-pp.__d_lookup_rcu
0.39 ± 15% -0.2 0.19 ± 33% perf-profile.self.cycles-pp.pid_revalidate
0.37 ± 11% -0.2 0.18 ± 22% perf-profile.self.cycles-pp.__entry_text_start
0.36 ± 14% -0.2 0.21 ± 37% perf-profile.self.cycles-pp.do_dentry_open
0.44 ± 17% -0.1 0.31 ± 24% perf-profile.self.cycles-pp.gather_pte_stats
0.23 ± 15% -0.1 0.14 ± 14% perf-profile.self.cycles-pp.__kmem_cache_alloc_node
0.10 ± 18% -0.1 0.03 ±100% perf-profile.self.cycles-pp.pid_task
0.21 ± 17% -0.1 0.14 ± 25% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
0.14 ± 23% -0.1 0.08 ± 30% perf-profile.self.cycles-pp.native_flush_tlb_one_user
0.16 ± 23% -0.1 0.09 ± 26% perf-profile.self.cycles-pp.generic_fillattr
0.09 ± 20% -0.1 0.03 ±101% perf-profile.self.cycles-pp.unlink_anon_vmas
0.10 ± 25% -0.1 0.04 ± 76% perf-profile.self.cycles-pp.proc_fill_cache
0.12 ± 20% -0.1 0.06 ± 58% perf-profile.self.cycles-pp.lookup_fast



>
> Mel/PeterZ,
>
> Whenever time permits can you please let us know your comments/concerns
> on the series?
>
> Thanks and Regards
> - Raghu
>