[linus:master] [sched/core] e932c4ab38: aim9.sync_disk_cp.ops_per_sec 2.3% improvement
From: kernel test robot
Date: Tue Dec 24 2024 - 03:34:32 EST
Hello,
kernel test robot noticed a 2.3% improvement of aim9.sync_disk_cp.ops_per_sec on:
commit: e932c4ab38f072ce5894b2851fea8bc5754bb8e5 ("sched/core: Prevent wakeup of ksoftirqd during idle load balance")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
testcase: aim9
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 4 threads Intel(R) Xeon(R) CPU E3-1225 v5 @ 3.30GHz (Skylake) with 16G memory
parameters:
testtime: 300s
test: sync_disk_cp
cpufreq_governor: performance
In addition to that, the commit also has significant impact on the following tests:
+------------------+-----------------------------------------------------------------------------+
| testcase: change | vm-scalability: vm-scalability.throughput 2.4% improvement |
| test machine | 4 threads Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz (Skylake) with 32G memory |
| test parameters | cpufreq_governor=performance |
| | runtime=300s |
| | test=migrate |
+------------------+-----------------------------------------------------------------------------+
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20241224/202412241607.dc13db91-lkp@xxxxxxxxx
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/lkp-skl-d06/sync_disk_cp/aim9/300s
commit:
ff47a0acfc ("sched/fair: Check idle_cpu() before need_resched() to detect ilb CPU turning busy")
e932c4ab38 ("sched/core: Prevent wakeup of ksoftirqd during idle load balance")
ff47a0acfcce309c e932c4ab38f072ce5894b2851fe
---------------- ---------------------------
%stddev %change %stddev
\ | \
779244 +2.3% 797195 aim9.sync_disk_cp.ops_per_sec
444185 ± 2% -51.7% 214738 ± 3% cpuidle..usage
40.83 ± 15% -84.5% 6.33 ± 23% perf-c2c.HITM.local
6505472 ± 12% +21.6% 7908010 ± 4% meminfo.DirectMap2M
29200 -10.3% 26194 meminfo.Shmem
0.08 ± 2% -0.0 0.06 ± 2% mpstat.cpu.all.irq%
0.04 ± 3% -0.0 0.03 ± 4% mpstat.cpu.all.soft%
2562 ± 2% -60.3% 1018 vmstat.system.cs
2343 -23.3% 1798 vmstat.system.in
117335 -53.2% 54952 sched_debug.cpu.nr_switches.avg
285639 ± 5% -71.9% 80403 ± 5% sched_debug.cpu.nr_switches.max
100396 ± 9% -77.1% 22968 ± 14% sched_debug.cpu.nr_switches.stddev
7316 -10.5% 6550 proc-vmstat.nr_shmem
58767234 +2.4% 60172860 proc-vmstat.numa_hit
58984855 +2.0% 60176451 proc-vmstat.numa_local
58862408 +2.3% 60212415 proc-vmstat.pgalloc_normal
58848231 +2.3% 60198260 proc-vmstat.pgfree
7.448e+08 +1.7% 7.574e+08 perf-stat.i.branch-instructions
1.35 -0.1 1.29 perf-stat.i.branch-miss-rate%
65562189 ± 2% -4.9% 62378502 perf-stat.i.cache-references
2571 ± 2% -60.5% 1016 perf-stat.i.context-switches
3.732e+09 +1.8% 3.797e+09 perf-stat.i.instructions
0.14 ± 3% -87.0% 0.02 perf-stat.i.metric.K/sec
7.426e+08 +1.7% 7.55e+08 perf-stat.ps.branch-instructions
65356430 ± 2% -4.9% 62171508 perf-stat.ps.cache-references
2563 ± 2% -60.5% 1012 perf-stat.ps.context-switches
3.72e+09 +1.7% 3.785e+09 perf-stat.ps.instructions
1.12e+12 +1.8% 1.14e+12 perf-stat.total.instructions
0.02 ± 25% +78.4% 0.03 ± 18% perf-sched.sch_delay.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
0.02 ± 55% +82.3% 0.04 ± 16% perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.__flush_work.__lru_add_drain_all
0.04 ± 21% +87.3% 0.07 ± 21% perf-sched.sch_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
0.01 ± 9% +35.2% 0.02 ± 6% perf-sched.total_sch_delay.average.ms
20.34 ± 5% +111.1% 42.94 perf-sched.total_wait_and_delay.average.ms
7025 ± 6% -54.0% 3228 perf-sched.total_wait_and_delay.count.ms
3058 ± 20% +63.5% 4998 perf-sched.total_wait_and_delay.max.ms
20.33 ± 5% +111.1% 42.92 perf-sched.total_wait_time.average.ms
3058 ± 20% +63.5% 4998 perf-sched.total_wait_time.max.ms
202.58 ± 18% +94.7% 394.49 ± 9% perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
609.98 ± 5% -17.9% 500.63 perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
9.01 ± 12% +6133.8% 561.38 ± 15% perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
3837 ± 12% -98.6% 52.17 ± 12% perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
1349 ± 39% +270.4% 4998 perf-sched.wait_and_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
2785 ± 16% -64.1% 1001 perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
202.50 ± 18% +94.8% 394.38 ± 9% perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
609.95 ± 5% -17.9% 500.56 perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
9.00 ± 12% +6140.7% 561.36 ± 15% perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
1349 ± 39% +270.4% 4998 perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
2785 ± 16% -64.1% 1001 perf-sched.wait_time.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
1.51 ± 6% -0.9 0.64 ± 11% perf-profile.calltrace.cycles-pp.common_startup_64
1.51 ± 6% -0.9 0.64 ± 11% perf-profile.children.cycles-pp.common_startup_64
1.51 ± 6% -0.9 0.64 ± 11% perf-profile.children.cycles-pp.cpu_startup_entry
1.51 ± 6% -0.9 0.64 ± 11% perf-profile.children.cycles-pp.do_idle
1.12 ± 6% -0.6 0.49 ± 13% perf-profile.children.cycles-pp.cpuidle_idle_call
0.92 ± 5% -0.5 0.42 ± 17% perf-profile.children.cycles-pp.cpuidle_enter
0.92 ± 5% -0.5 0.42 ± 17% perf-profile.children.cycles-pp.cpuidle_enter_state
0.50 ± 6% -0.3 0.21 ± 12% perf-profile.children.cycles-pp.intel_idle
0.52 ± 8% -0.2 0.33 ± 7% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.48 ± 6% -0.2 0.31 ± 7% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.27 ± 18% -0.2 0.10 ± 36% perf-profile.children.cycles-pp.__schedule
0.20 ± 12% -0.2 0.04 ± 73% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
0.21 ± 13% -0.2 0.06 ± 20% perf-profile.children.cycles-pp.flush_smp_call_function_queue
0.24 ± 9% -0.1 0.11 ± 12% perf-profile.children.cycles-pp.ret_from_fork
0.24 ± 9% -0.1 0.11 ± 12% perf-profile.children.cycles-pp.ret_from_fork_asm
0.24 ± 9% -0.1 0.11 ± 10% perf-profile.children.cycles-pp.kthread
0.18 ± 8% -0.1 0.05 ± 49% perf-profile.children.cycles-pp.schedule
0.31 ± 9% -0.1 0.19 ± 8% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.30 ± 9% -0.1 0.19 ± 8% perf-profile.children.cycles-pp.hrtimer_interrupt
0.25 ± 8% -0.1 0.16 ± 7% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.11 ± 11% -0.1 0.02 ± 99% perf-profile.children.cycles-pp.try_to_block_task
0.10 ± 13% -0.1 0.02 ± 99% perf-profile.children.cycles-pp.dequeue_task_fair
0.21 ± 12% -0.1 0.14 ± 5% perf-profile.children.cycles-pp.tick_nohz_handler
0.10 ± 14% -0.1 0.02 ± 99% perf-profile.children.cycles-pp.dequeue_entities
0.17 ± 13% -0.1 0.10 ± 4% perf-profile.children.cycles-pp.update_process_times
0.11 ± 12% -0.0 0.07 ± 6% perf-profile.children.cycles-pp.sched_tick
40.09 +0.6 40.66 perf-profile.children.cycles-pp.read
0.50 ± 6% -0.3 0.21 ± 12% perf-profile.self.cycles-pp.intel_idle
0.97 ± 4% +0.1 1.05 ± 3% perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
***************************************************************************************************
lkp-skl-d03: 4 threads Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz (Skylake) with 32G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/300s/lkp-skl-d03/migrate/vm-scalability
commit:
ff47a0acfc ("sched/fair: Check idle_cpu() before need_resched() to detect ilb CPU turning busy")
e932c4ab38 ("sched/core: Prevent wakeup of ksoftirqd during idle load balance")
ff47a0acfcce309c e932c4ab38f072ce5894b2851fe
---------------- ---------------------------
%stddev %change %stddev
\ | \
181821 -12.5% 159050 meminfo.Mapped
0.02 ± 4% -20.4% 0.01 ± 5% perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
45923 -12.8% 40022 proc-vmstat.nr_mapped
1.00 ± 99% -100.0% 0.00 ± 52% vm-scalability.free_time
2422987 +2.4% 2480833 vm-scalability.median
2422987 +2.4% 2480833 vm-scalability.throughput
90071 +2.5% 92323 vm-scalability.time.involuntary_context_switches
3.03 ± 3% -0.2 2.84 ± 3% perf-profile.calltrace.cycles-pp.unmap_vmas.exit_mmap.__mmput.exit_mm.do_exit
2.84 ± 2% -0.2 2.67 ± 3% perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.exit_mmap.__mmput.exit_mm
6.04 ± 2% -0.2 5.88 perf-profile.calltrace.cycles-pp.__mmput.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group
6.06 ± 2% -0.2 5.89 perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
2.78 ± 2% -0.1 2.64 ± 2% perf-profile.calltrace.cycles-pp.filemap_map_pages.do_read_fault.do_pte_missing.__handle_mm_fault.handle_mm_fault
2.90 -0.1 2.77 ± 2% perf-profile.calltrace.cycles-pp.do_read_fault.do_pte_missing.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
0.90 ± 4% +0.1 0.95 ± 3% perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.92 ± 3% +0.1 0.99 ± 2% perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.77 ± 4% +0.1 0.84 ± 3% perf-profile.calltrace.cycles-pp.__mmap_region.do_mmap.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.80 ± 7% +0.1 0.91 ± 7% perf-profile.calltrace.cycles-pp.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
4.90 ± 2% -0.2 4.70 ± 2% perf-profile.children.cycles-pp.do_read_fault
6.09 ± 2% -0.2 5.92 perf-profile.children.cycles-pp.exit_mm
0.54 ± 2% -0.1 0.49 ± 8% perf-profile.children.cycles-pp.___perf_sw_event
0.39 ± 5% -0.0 0.35 ± 6% perf-profile.children.cycles-pp.vfs_open
0.20 ± 4% -0.0 0.16 ± 10% perf-profile.children.cycles-pp.opendir
0.15 ± 8% +0.0 0.19 ± 5% perf-profile.children.cycles-pp.__kmalloc_cache_noprof
0.18 ± 6% +0.0 0.22 ± 11% perf-profile.children.cycles-pp.__kernel_read
0.29 ± 5% +0.0 0.34 ± 5% perf-profile.children.cycles-pp.filemap_read
1.17 ± 4% +0.1 1.28 ± 4% perf-profile.children.cycles-pp.__memcg_slab_free_hook
0.44 ± 4% -0.0 0.40 ± 8% perf-profile.self.cycles-pp.___perf_sw_event
0.07 ± 15% -0.0 0.04 ± 71% perf-profile.self.cycles-pp.__folio_batch_add_and_move
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki