[linus:master] [cpuidle] 5484e31bbb: adrestia.wakeup_cost_periodic_us -33.3% improvement

From: kernel test robot
Date: Tue Sep 05 2023 - 21:18:57 EST




Hello,

kernel test robot noticed a -33.3% improvement of adrestia.wakeup_cost_periodic_us on:


commit: 5484e31bbbff285f9505c4766373f840ffb746e5 ("cpuidle: menu: Skip tick_nohz_get_sleep_length() call in some cases")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: adrestia
test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz (Haswell) with 8G memory
parameters:

nr_threads: 100
cpufreq_governor: performance



Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230905/202309051653.1dce02c8-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/testcase:
gcc-12/performance/x86_64-rhel-8.3/100/debian-11.1-x86_64-20220510.cgz/lkp-hsw-d04/adrestia

commit:
2662342079 ("cpuidle: teo: Gather statistics regarding whether or not to stop the tick")
5484e31bbb ("cpuidle: menu: Skip tick_nohz_get_sleep_length() call in some cases")

2662342079f54b8a 5484e31bbbff285f9505c476637
---------------- ---------------------------
%stddev %change %stddev
\ | \
0.06 ± 56% -52.2% 0.03 ± 17% perf-sched.sch_delay.max.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
9603 -0.8% 9529 proc-vmstat.nr_slab_unreclaimable
12707 -6.7% 11859 vmstat.system.in
0.91 -0.2 0.74 mpstat.cpu.all.irq%
0.06 -0.0 0.05 mpstat.cpu.all.soft%
698830 ± 11% +18.4% 827374 ± 7% sched_debug.cpu.avg_idle.max
222705 ± 10% +18.7% 264460 ± 5% sched_debug.cpu.avg_idle.stddev
0.34 ± 16% +35.5% 0.47 ± 15% sched_debug.cpu.clock.stddev
242150 -79.0% 50912 ± 7% adrestia.time.involuntary_context_switches
38.40 -5.7% 36.20 adrestia.time.percent_of_cpu_this_job_got
138.60 -6.7% 129.34 adrestia.time.system_time
6.00 -33.3% 4.00 adrestia.wakeup_cost_periodic_us
5674120 +110.4% 11939267 turbostat.C1
33.31 +5.8 39.11 turbostat.C1%
3296324 +24.3% 4096313 turbostat.C1E
4.70 +6.5 11.18 ± 2% turbostat.C1E%
5791021 -48.2% 3001325 ± 2% turbostat.C3
17.79 +4.8 22.59 turbostat.C3%
810414 -85.6% 117003 ± 4% turbostat.C6
8.11 -6.8 1.31 ± 5% turbostat.C6%
1442211 -52.8% 680532 ± 2% turbostat.C7s
23.74 -9.1 14.69 ± 2% turbostat.C7s%
61.86 +20.3% 74.43 turbostat.CPU%c1
14.52 ± 2% -28.7% 10.35 ± 3% turbostat.CPU%c3
3.89 -93.4% 0.26 ± 11% turbostat.CPU%c6
7.81 ± 5% -49.6% 3.94 ± 4% turbostat.CPU%c7
10.66 +3.9% 11.08 turbostat.CorWatt
3.49 -0.5 3.02 turbostat.POLL%
1.48 ± 2% -75.0% 0.37 ± 18% turbostat.Pkg%pc2
18.29 +5.4% 19.28 turbostat.PkgWatt
12.72 -16.8% 10.59 perf-stat.i.MPKI
4.574e+08 -3.8% 4.398e+08 perf-stat.i.branch-instructions
1.46 -0.1 1.31 perf-stat.i.branch-miss-rate%
7284139 -8.1% 6690632 perf-stat.i.branch-misses
2.68 -0.9 1.80 perf-stat.i.cache-miss-rate%
458457 -41.3% 268958 perf-stat.i.cache-misses
18029497 -17.1% 14954548 perf-stat.i.cache-references
2.25 -6.0% 2.11 perf-stat.i.cpi
3.612e+09 -8.1% 3.318e+09 perf-stat.i.cpu-cycles
9738 -64.2% 3490 ± 4% perf-stat.i.cpu-migrations
9532 +49.8% 14284 perf-stat.i.cycles-between-cache-misses
0.58 ± 2% -0.2 0.43 ± 4% perf-stat.i.dTLB-load-miss-rate%
1915780 -22.0% 1494726 ± 7% perf-stat.i.dTLB-load-misses
0.31 ± 3% -0.1 0.25 ± 5% perf-stat.i.dTLB-store-miss-rate%
611947 ± 2% -22.2% 476112 ± 3% perf-stat.i.dTLB-store-misses
3.11e+08 -4.8% 2.962e+08 perf-stat.i.dTLB-stores
50.70 -12.1 38.63 perf-stat.i.iTLB-load-miss-rate%
709004 -14.4% 606971 ± 4% perf-stat.i.iTLB-load-misses
641330 +22.2% 783541 perf-stat.i.iTLB-loads
2.025e+09 -3.2% 1.96e+09 perf-stat.i.instructions
2772 +23.4% 3420 ± 2% perf-stat.i.instructions-per-iTLB-miss
0.48 +5.4% 0.50 perf-stat.i.ipc
0.45 -8.1% 0.41 perf-stat.i.metric.GHz
161.95 -3.4% 156.41 perf-stat.i.metric.M/sec
8.90 -14.3% 7.63 perf-stat.overall.MPKI
1.59 -0.1 1.52 perf-stat.overall.branch-miss-rate%
2.54 -0.7 1.80 perf-stat.overall.cache-miss-rate%
1.78 -5.1% 1.69 perf-stat.overall.cpi
7878 +56.6% 12335 perf-stat.overall.cycles-between-cache-misses
0.37 -0.1 0.30 ± 6% perf-stat.overall.dTLB-load-miss-rate%
0.20 ± 2% -0.0 0.16 ± 3% perf-stat.overall.dTLB-store-miss-rate%
52.51 -8.9 43.63 ± 2% perf-stat.overall.iTLB-load-miss-rate%
2856 +13.2% 3234 ± 4% perf-stat.overall.instructions-per-iTLB-miss
0.56 +5.4% 0.59 perf-stat.overall.ipc
4.565e+08 -3.8% 4.39e+08 perf-stat.ps.branch-instructions
7270403 -8.1% 6677885 perf-stat.ps.branch-misses
457602 -41.3% 268449 perf-stat.ps.cache-misses
17995840 -17.1% 14926426 perf-stat.ps.cache-references
3.605e+09 -8.1% 3.312e+09 perf-stat.ps.cpu-cycles
9720 -64.2% 3484 ± 4% perf-stat.ps.cpu-migrations
1912206 -22.0% 1491912 ± 7% perf-stat.ps.dTLB-load-misses
610806 ± 2% -22.2% 475217 ± 3% perf-stat.ps.dTLB-store-misses
3.104e+08 -4.8% 2.956e+08 perf-stat.ps.dTLB-stores
707671 -14.4% 605828 ± 4% perf-stat.ps.iTLB-load-misses
640132 +22.2% 782070 perf-stat.ps.iTLB-loads
2.021e+09 -3.2% 1.957e+09 perf-stat.ps.instructions
1.089e+12 -3.9% 1.047e+12 perf-stat.total.instructions
28.35 ± 5% -3.8 24.57 ± 11% perf-profile.calltrace.cycles-pp.poll_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
70.22 -2.0 68.23 perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
2.07 ± 6% -0.5 1.57 ± 11% perf-profile.calltrace.cycles-pp.menu_select.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
3.95 ± 5% -0.4 3.57 ± 3% perf-profile.calltrace.cycles-pp.__schedule.schedule.pipe_read.vfs_read.ksys_read
4.02 ± 4% -0.4 3.65 ± 2% perf-profile.calltrace.cycles-pp.schedule.pipe_read.vfs_read.ksys_read.do_syscall_64
0.82 ± 14% +0.2 1.03 ± 15% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
2.07 +0.2 2.29 ± 9% perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write.start_thread
0.86 ± 15% +0.2 1.08 ± 12% perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
3.22 ± 3% +0.3 3.53 ± 6% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write.start_thread
3.27 ± 3% +0.3 3.59 ± 6% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_write.start_thread
1.30 ± 12% +0.4 1.71 ± 12% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
1.41 ± 11% +0.4 1.84 ± 10% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
0.21 ±122% +0.4 0.66 ± 8% perf-profile.calltrace.cycles-pp.prepare_to_wait_event.pipe_read.vfs_read.ksys_read.do_syscall_64
4.08 ± 3% +0.6 4.65 ± 8% perf-profile.calltrace.cycles-pp.__libc_write.start_thread
15.14 ± 2% +0.9 16.08 ± 6% perf-profile.calltrace.cycles-pp.start_thread
28.56 ± 5% -3.8 24.73 ± 11% perf-profile.children.cycles-pp.poll_idle
70.19 -2.0 68.19 perf-profile.children.cycles-pp.do_idle
70.22 -2.0 68.23 perf-profile.children.cycles-pp.secondary_startup_64_no_verify
70.22 -2.0 68.23 perf-profile.children.cycles-pp.cpu_startup_entry
2.40 ± 4% -0.6 1.80 ± 9% perf-profile.children.cycles-pp.menu_select
0.74 ± 18% -0.4 0.33 ± 32% perf-profile.children.cycles-pp.newidle_balance
1.47 ± 8% -0.4 1.05 ± 14% perf-profile.children.cycles-pp.pick_next_task_fair
4.13 ± 4% -0.4 3.72 ± 2% perf-profile.children.cycles-pp.schedule
0.76 ± 13% -0.4 0.38 ± 26% perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
0.58 ± 19% -0.3 0.29 ± 24% perf-profile.children.cycles-pp.switch_mm_irqs_off
0.55 ± 14% -0.3 0.26 ± 45% perf-profile.children.cycles-pp.load_balance
0.42 ± 23% -0.2 0.24 ± 32% perf-profile.children.cycles-pp.tick_nohz_next_event
0.22 ± 17% -0.1 0.09 ± 31% perf-profile.children.cycles-pp.hrtimer_next_event_without
0.21 ± 31% -0.1 0.09 ± 65% perf-profile.children.cycles-pp.select_idle_cpu
0.16 ± 25% -0.1 0.06 ± 88% perf-profile.children.cycles-pp.set_task_cpu
0.19 ± 47% -0.1 0.09 ± 42% perf-profile.children.cycles-pp.leave_mm
0.27 ± 18% -0.1 0.18 ± 23% perf-profile.children.cycles-pp.asm_sysvec_call_function_single
0.13 ± 32% -0.1 0.04 ± 90% perf-profile.children.cycles-pp.__hrtimer_next_event_base
0.26 ± 3% -0.1 0.18 ± 13% perf-profile.children.cycles-pp.tick_nohz_idle_enter
0.09 ± 26% +0.1 0.16 ± 28% perf-profile.children.cycles-pp.clockevents_program_event
0.44 ± 9% +0.1 0.56 ± 9% perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
0.37 ± 3% +0.2 0.54 ± 11% perf-profile.children.cycles-pp.mutex_unlock
0.46 ± 13% +0.2 0.68 ± 10% perf-profile.children.cycles-pp.prepare_to_wait_event
1.63 ± 3% +0.2 1.87 ± 5% perf-profile.children.cycles-pp.__entry_text_start
1.75 ± 12% +0.3 2.04 ± 11% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
1.87 ± 11% +0.3 2.19 ± 9% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
1.32 ± 4% +0.4 1.76 ± 7% perf-profile.children.cycles-pp.syscall_return_via_sysret
13.54 ± 2% +0.6 14.15 ± 2% perf-profile.children.cycles-pp.__libc_read
15.14 ± 2% +0.9 16.08 ± 6% perf-profile.children.cycles-pp.start_thread
27.99 ± 6% -3.7 24.26 ± 11% perf-profile.self.cycles-pp.poll_idle
0.56 ± 20% -0.3 0.28 ± 24% perf-profile.self.cycles-pp.switch_mm_irqs_off
1.34 ± 5% -0.3 1.07 ± 11% perf-profile.self.cycles-pp.menu_select
0.24 ± 12% -0.1 0.18 ± 8% perf-profile.self.cycles-pp.update_curr
0.14 ± 12% -0.0 0.11 ± 10% perf-profile.self.cycles-pp.touch_atime
0.03 ±127% +0.1 0.09 ± 24% perf-profile.self.cycles-pp.copy_page_from_iter
0.21 ± 17% +0.1 0.35 ± 19% perf-profile.self.cycles-pp.prepare_to_wait_event
0.44 ± 20% +0.2 0.59 ± 13% perf-profile.self.cycles-pp.pipe_write
0.36 ± 6% +0.2 0.54 ± 12% perf-profile.self.cycles-pp.mutex_unlock
1.42 ± 4% +0.2 1.60 ± 6% perf-profile.self.cycles-pp.__entry_text_start
1.32 ± 4% +0.4 1.76 ± 6% perf-profile.self.cycles-pp.syscall_return_via_sysret



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki