[linus:master] [sched/fair] 5e963f2bd4: will-it-scale.per_thread_ops 2.5% improvement

From: kernel test robot
Date: Wed Sep 13 2023 - 11:09:22 EST



hi, Peter Zijlstra,

Yu helped review this report. though maybe not so valueable like those
hackbench/netperf report for EEVDF which has huge performance difference,
we report this just FYI since we got pretty stable results even by rebuilding
kernel and more reruns.


Hello,

kernel test robot noticed a 2.5% improvement of will-it-scale.per_thread_ops on:


commit: 5e963f2bd4654a202a8a05aa3a86cb0300b10e6c ("sched/fair: Commit to EEVDF")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: will-it-scale
test machine: 104 threads 2 sockets (Skylake) with 192G memory
parameters:

nr_task: 100%
mode: thread
test: context_switch1
cpufreq_governor: performance



Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230913/202309132209.cae4f58a-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/thread/100%/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/context_switch1/will-it-scale

commit:
e8f331bcc2 ("sched/smp: Use lag to simplify cross-runqueue placement")
5e963f2bd4 ("sched/fair: Commit to EEVDF")

e8f331bcc270354a 5e963f2bd4654a202a8a05aa3a8
---------------- ---------------------------
%stddev %change %stddev
\ | \
18121238 +2.5% 18575201 vmstat.system.cs
18317774 +2.5% 18781349 will-it-scale.104.threads
176131 +2.5% 180589 will-it-scale.per_thread_ops
18317774 +2.5% 18781349 will-it-scale.workload
1.257e+08 -96.7% 4139803 sched_debug.sysctl_sched.sysctl_sched_features
0.75 -100.0% 0.00 sched_debug.sysctl_sched.sysctl_sched_idle_min_granularity
24.00 -100.0% 0.00 sched_debug.sysctl_sched.sysctl_sched_latency
4.00 -100.0% 0.00 sched_debug.sysctl_sched.sysctl_sched_wakeup_granularity
1.65 +0.0 1.68 perf-stat.i.branch-miss-rate%
4.185e+08 +1.3% 4.24e+08 perf-stat.i.branch-misses
18284380 +2.5% 18745294 perf-stat.i.context-switches
0.10 +0.0 0.10 perf-stat.i.dTLB-load-miss-rate%
37343347 +2.5% 38269096 perf-stat.i.dTLB-load-misses
3.711e+10 -1.1% 3.671e+10 perf-stat.i.dTLB-loads
2.231e+10 -1.0% 2.208e+10 perf-stat.i.dTLB-stores
60.89 +15.4 76.32 perf-stat.i.iTLB-load-miss-rate%
42744641 ± 3% +60.6% 68665465 ± 3% perf-stat.i.iTLB-load-misses
27283919 -21.7% 21361180 ± 2% perf-stat.i.iTLB-loads
3211 ± 3% -37.7% 2001 ± 3% perf-stat.i.instructions-per-iTLB-miss
0.10 +0.0 0.10 perf-stat.overall.dTLB-load-miss-rate%
61.02 +15.2 76.26 perf-stat.overall.iTLB-load-miss-rate%
3060 ± 3% -38.2% 1890 ± 3% perf-stat.overall.instructions-per-iTLB-miss
2146287 -3.2% 2077784 perf-stat.overall.path-length
4.171e+08 +1.3% 4.226e+08 perf-stat.ps.branch-misses
18221874 +2.5% 18680885 perf-stat.ps.context-switches
37218153 +2.5% 38141041 perf-stat.ps.dTLB-load-misses
3.699e+10 -1.1% 3.659e+10 perf-stat.ps.dTLB-loads
2.223e+10 -1.0% 2.201e+10 perf-stat.ps.dTLB-stores
42595400 ± 3% +60.6% 68425583 ± 3% perf-stat.ps.iTLB-load-misses
27192032 -21.7% 21288405 ± 2% perf-stat.ps.iTLB-loads
25.66 -1.0 24.68 perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
27.29 -0.8 26.46 perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
38.80 -0.8 38.02 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_read
35.13 -0.8 34.36 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
15.91 -0.6 15.26 perf-profile.calltrace.cycles-pp.schedule.pipe_read.vfs_read.ksys_read.do_syscall_64
15.44 -0.6 14.79 perf-profile.calltrace.cycles-pp.__schedule.schedule.pipe_read.vfs_read.ksys_read
22.78 -0.6 22.19 perf-profile.calltrace.cycles-pp.pipe_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
9.33 -0.5 8.81 perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
3.70 -0.4 3.29 perf-profile.calltrace.cycles-pp.pick_next_task_fair.__schedule.schedule.pipe_read.vfs_read
1.41 -0.4 1.03 ± 2% perf-profile.calltrace.cycles-pp.check_preempt_wakeup.check_preempt_curr.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
1.66 -0.3 1.36 ± 2% perf-profile.calltrace.cycles-pp.check_preempt_curr.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common
7.52 -0.3 7.24 perf-profile.calltrace.cycles-pp.activate_task.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common
7.31 -0.3 7.06 perf-profile.calltrace.cycles-pp.enqueue_task_fair.activate_task.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
1.31 ± 2% -0.2 1.13 ± 2% perf-profile.calltrace.cycles-pp.prepare_task_switch.__schedule.schedule.pipe_read.vfs_read
1.57 -0.2 1.40 ± 3% perf-profile.calltrace.cycles-pp.reweight_entity.enqueue_task_fair.activate_task.ttwu_do_activate.try_to_wake_up
0.71 ± 3% -0.1 0.62 ± 4% perf-profile.calltrace.cycles-pp.update_curr.reweight_entity.enqueue_task_fair.activate_task.ttwu_do_activate
0.71 ± 3% -0.1 0.63 ± 3% perf-profile.calltrace.cycles-pp.update_curr.reweight_entity.dequeue_task_fair.__schedule.schedule
0.84 -0.0 0.80 perf-profile.calltrace.cycles-pp.___perf_sw_event.prepare_task_switch.__schedule.schedule.pipe_read
0.87 -0.0 0.84 perf-profile.calltrace.cycles-pp.place_entity.enqueue_entity.enqueue_task_fair.activate_task.ttwu_do_activate
0.84 ± 2% +0.1 0.91 ± 2% perf-profile.calltrace.cycles-pp.atime_needs_update.touch_atime.pipe_read.vfs_read.ksys_read
1.08 +0.1 1.16 ± 2% perf-profile.calltrace.cycles-pp.touch_atime.pipe_read.vfs_read.ksys_read.do_syscall_64
0.69 +0.1 0.76 perf-profile.calltrace.cycles-pp.update_load_avg.set_next_entity.pick_next_task_fair.__schedule.schedule
1.43 +0.1 1.52 perf-profile.calltrace.cycles-pp.set_next_entity.pick_next_task_fair.__schedule.schedule.pipe_read
1.02 ± 3% +0.1 1.12 ± 2% perf-profile.calltrace.cycles-pp.__fget_light.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.04 ± 3% +0.1 1.15 ± 2% perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
1.29 ± 4% +0.2 1.48 ± 3% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.__libc_read
25.69 -1.0 24.71 perf-profile.children.cycles-pp.vfs_read
27.34 -0.8 26.52 perf-profile.children.cycles-pp.ksys_read
22.95 -0.6 22.38 perf-profile.children.cycles-pp.pipe_read
17.97 -0.5 17.44 perf-profile.children.cycles-pp.__schedule
9.38 -0.5 8.86 perf-profile.children.cycles-pp.ttwu_do_activate
18.42 -0.5 17.90 perf-profile.children.cycles-pp.schedule
5.42 -0.4 5.04 perf-profile.children.cycles-pp.pick_next_task_fair
1.43 -0.3 1.09 ± 2% perf-profile.children.cycles-pp.check_preempt_wakeup
1.67 -0.3 1.38 ± 2% perf-profile.children.cycles-pp.check_preempt_curr
3.09 ± 2% -0.3 2.81 ± 2% perf-profile.children.cycles-pp.reweight_entity
7.53 -0.3 7.26 perf-profile.children.cycles-pp.activate_task
7.33 -0.2 7.08 perf-profile.children.cycles-pp.enqueue_task_fair
1.68 ± 2% -0.2 1.46 ± 2% perf-profile.children.cycles-pp.prepare_task_switch
0.35 ± 2% -0.2 0.17 ± 4% perf-profile.children.cycles-pp.pick_next_entity
0.52 ± 10% -0.1 0.39 ± 11% perf-profile.children.cycles-pp.cpuacct_charge
0.88 ± 2% -0.1 0.77 ± 3% perf-profile.children.cycles-pp.__calc_delta
0.41 ± 6% -0.1 0.31 ± 9% perf-profile.children.cycles-pp.switch_mm_irqs_off
0.78 ± 2% -0.1 0.70 ± 2% perf-profile.children.cycles-pp.put_prev_entity
0.57 -0.0 0.52 perf-profile.children.cycles-pp.__cond_resched
0.24 ± 2% -0.0 0.20 ± 3% perf-profile.children.cycles-pp.__list_add_valid
1.29 -0.0 1.25 perf-profile.children.cycles-pp.mutex_lock
0.41 ± 3% -0.0 0.37 ± 3% perf-profile.children.cycles-pp.__list_del_entry_valid
0.27 ± 2% -0.0 0.23 ± 2% perf-profile.children.cycles-pp.check_cfs_rq_runtime
0.53 -0.0 0.50 ± 2% perf-profile.children.cycles-pp.copyout
0.50 ± 2% -0.0 0.47 ± 2% perf-profile.children.cycles-pp.__pthread_disable_asynccancel
0.20 ± 4% -0.0 0.18 ± 5% perf-profile.children.cycles-pp.inode_needs_update_time
0.12 ± 4% -0.0 0.10 ± 4% perf-profile.children.cycles-pp.kill_fasync
0.21 ± 4% +0.0 0.24 ± 2% perf-profile.children.cycles-pp.__x64_sys_write
0.19 ± 4% +0.0 0.22 ± 5% perf-profile.children.cycles-pp.ttwu_queue_wakelist
0.08 ± 6% +0.0 0.12 ± 5% perf-profile.children.cycles-pp.__rb_insert_augmented
0.99 +0.0 1.03 perf-profile.children.cycles-pp.__switch_to
1.33 +0.1 1.39 perf-profile.children.cycles-pp.__update_load_avg_se
0.00 +0.1 0.06 ± 8% perf-profile.children.cycles-pp.make_vfsgid
0.26 ± 3% +0.1 0.32 ± 3% perf-profile.children.cycles-pp.finish_task_switch
0.54 ± 2% +0.1 0.60 ± 2% perf-profile.children.cycles-pp.fput
0.86 ± 2% +0.1 0.93 ± 2% perf-profile.children.cycles-pp.atime_needs_update
0.50 +0.1 0.57 ± 2% perf-profile.children.cycles-pp.__dequeue_entity
1.09 +0.1 1.17 ± 2% perf-profile.children.cycles-pp.touch_atime
1.82 +0.1 1.93 perf-profile.children.cycles-pp.set_next_entity
2.03 ± 3% +0.1 2.17 perf-profile.children.cycles-pp.__fget_light
2.10 ± 3% +0.1 2.24 perf-profile.children.cycles-pp.__fdget_pos
4.33 +0.2 4.55 perf-profile.children.cycles-pp.update_load_avg
1.48 -0.3 1.14 perf-profile.self.cycles-pp.vfs_read
0.60 ± 6% -0.2 0.42 ± 7% perf-profile.self.cycles-pp.prepare_task_switch
0.71 -0.2 0.54 ± 2% perf-profile.self.cycles-pp.check_preempt_wakeup
0.50 ± 10% -0.1 0.38 ± 11% perf-profile.self.cycles-pp.cpuacct_charge
0.87 ± 2% -0.1 0.76 ± 3% perf-profile.self.cycles-pp.__calc_delta
0.62 ± 4% -0.1 0.53 perf-profile.self.cycles-pp.dequeue_entity
0.39 ± 7% -0.1 0.30 ± 9% perf-profile.self.cycles-pp.switch_mm_irqs_off
0.32 ± 3% -0.1 0.23 ± 2% perf-profile.self.cycles-pp.put_prev_entity
1.37 -0.1 1.30 ± 2% perf-profile.self.cycles-pp.pick_next_task_fair
0.58 ± 3% -0.1 0.52 ± 3% perf-profile.self.cycles-pp.exit_to_user_mode_prepare
0.76 ± 3% -0.1 0.71 ± 2% perf-profile.self.cycles-pp.__libc_read
0.40 ± 2% -0.0 0.36 ± 2% perf-profile.self.cycles-pp.__cond_resched
0.23 ± 2% -0.0 0.19 ± 3% perf-profile.self.cycles-pp.__list_add_valid
0.18 ± 3% -0.0 0.14 ± 4% perf-profile.self.cycles-pp.check_cfs_rq_runtime
0.36 ± 2% -0.0 0.33 ± 2% perf-profile.self.cycles-pp.copyout
0.19 ± 3% -0.0 0.16 ± 5% perf-profile.self.cycles-pp.activate_task
0.45 ± 2% -0.0 0.42 ± 3% perf-profile.self.cycles-pp.__pthread_disable_asynccancel
0.19 ± 3% -0.0 0.17 ± 4% perf-profile.self.cycles-pp.inode_needs_update_time
0.10 ± 5% -0.0 0.08 ± 4% perf-profile.self.cycles-pp.kill_fasync
0.12 ± 5% +0.0 0.14 ± 5% perf-profile.self.cycles-pp.exit_to_user_mode_loop
0.34 ± 3% +0.0 0.37 ± 3% perf-profile.self.cycles-pp.ksys_write
0.18 ± 3% +0.0 0.22 ± 5% perf-profile.self.cycles-pp.ttwu_queue_wakelist
0.08 ± 5% +0.0 0.11 ± 4% perf-profile.self.cycles-pp.__rb_insert_augmented
0.08 ± 7% +0.0 0.12 ± 3% perf-profile.self.cycles-pp.rb_next
0.90 +0.0 0.95 perf-profile.self.cycles-pp.__switch_to
0.39 ± 2% +0.0 0.43 ± 2% perf-profile.self.cycles-pp.__dequeue_entity
0.22 ± 2% +0.0 0.26 ± 2% perf-profile.self.cycles-pp.ttwu_do_activate
1.30 +0.1 1.35 perf-profile.self.cycles-pp.__update_load_avg_se
0.46 ± 2% +0.1 0.51 ± 2% perf-profile.self.cycles-pp.fput
0.00 +0.1 0.06 ± 8% perf-profile.self.cycles-pp.make_vfsgid
0.19 ± 4% +0.1 0.24 ± 4% perf-profile.self.cycles-pp.finish_task_switch
0.38 ± 3% +0.1 0.47 ± 3% perf-profile.self.cycles-pp.ksys_read
1.43 ± 2% +0.1 1.54 ± 2% perf-profile.self.cycles-pp.pipe_write
1.05 ± 2% +0.1 1.17 ± 2% perf-profile.self.cycles-pp.vfs_write
2.00 ± 3% +0.1 2.14 perf-profile.self.cycles-pp.__fget_light
1.71 ± 2% +0.1 1.86 ± 2% perf-profile.self.cycles-pp.update_load_avg



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki