[linus:master] [sched/core] ea9cffc0a1: stream.triad_bandwidth_MBps 1.1% improvement

From: kernel test robot
Date: Thu Dec 19 2024 - 21:47:00 EST




Hello,

kernel test robot noticed a 1.1% improvement of stream.triad_bandwidth_MBps on:


commit: ea9cffc0a154124821531991d5afdd7e8b20d7aa ("sched/core: Remove the unnecessary need_resched() check in nohz_csd_func()")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


testcase: stream
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 4 threads Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz (Skylake) with 32G memory
parameters:

nr_threads: 50%
iterations: 10x
array_size: 50000000
loop: 100
cpufreq_governor: performance



Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20241220/202412201007.aa43a5fa-lkp@xxxxxxxxx

=========================================================================================
array_size/compiler/cpufreq_governor/iterations/kconfig/loop/nr_threads/rootfs/tbox_group/testcase:
50000000/gcc-12/performance/10x/x86_64-rhel-9.4/100/50%/debian-12-x86_64-20240206.cgz/lkp-skl-d02/stream

commit:
6675ce2004 ("softirq: Allow raising SCHED_SOFTIRQ from SMP-call-function on RT kernel")
ea9cffc0a1 ("sched/core: Remove the unnecessary need_resched() check in nohz_csd_func()")

6675ce20046d149e ea9cffc0a154124821531991d5a
---------------- ---------------------------
%stddev %change %stddev
\ | \
15264 +23.1% 18793 meminfo.Shmem
0.02 ± 4% +0.0 0.03 ± 4% mpstat.cpu.all.soft%
3818 +23.1% 4700 proc-vmstat.nr_shmem
587.28 +302.4% 2363 vmstat.system.cs
2577 -3.5% 2488 vmstat.system.in
36673 ± 2% +164.6% 97051 ± 2% sched_debug.cpu.nr_switches.avg
53585 ± 10% +332.2% 231568 ± 16% sched_debug.cpu.nr_switches.max
12003 ± 23% +578.7% 81463 ± 24% sched_debug.cpu.nr_switches.stddev
578.05 +310.5% 2372 perf-stat.i.context-switches
14.72 ± 4% +10.8% 16.30 perf-stat.i.cpu-migrations
0.04 ± 5% +268.8% 0.15 perf-stat.i.metric.K/sec
575.63 +310.5% 2363 perf-stat.ps.context-switches
14.65 ± 4% +10.8% 16.23 perf-stat.ps.cpu-migrations
18760 +1.0% 18950 stream.add_bandwidth_MBps
18759 +1.0% 18948 stream.add_bandwidth_MBps_harmonicMean
14581 +1.2% 14751 stream.scale_bandwidth_MBps
14580 +1.2% 14748 stream.scale_bandwidth_MBps_harmonicMean
18289 +1.1% 18487 stream.triad_bandwidth_MBps
18287 +1.1% 18484 stream.triad_bandwidth_MBps_harmonicMean
0.02 ± 12% -32.3% 0.01 ± 16% perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
0.02 ± 42% -48.6% 0.01 ± 7% perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.10 ± 70% +332.7% 0.44 ± 95% perf-sched.sch_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
65.81 ± 3% -68.0% 21.05 ± 3% perf-sched.total_wait_and_delay.average.ms
2011 +229.0% 6618 ± 4% perf-sched.total_wait_and_delay.count.ms
65.80 ± 3% -68.0% 21.04 ± 3% perf-sched.total_wait_time.average.ms
3.86 ± 2% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
500.54 +24.3% 622.17 ± 5% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
497.31 ± 14% -98.6% 6.72 ± 7% perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.02 ± 15% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
19.83 ± 22% -100.0% 0.00 perf-sched.wait_and_delay.count.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
53.83 ± 9% +8594.4% 4680 ± 5% perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
21.00 -100.0% 0.00 perf-sched.wait_and_delay.count.wait_for_partner.fifo_open.do_dentry_open.vfs_open
3666 ± 51% -72.7% 1000 perf-sched.wait_and_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
4.04 -100.0% 0.00 perf-sched.wait_and_delay.max.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
1001 +136.0% 2362 ± 8% perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
0.05 ± 37% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
500.52 +24.3% 622.15 ± 5% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
497.29 ± 14% -98.7% 6.71 ± 7% perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.00 ±165% +525.0% 0.00 ± 68% perf-sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
3666 ± 51% -72.7% 1000 perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
1001 +136.0% 2362 ± 8% perf-sched.wait_time.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
0.01 ±142% +247.8% 0.04 ± 54% perf-sched.wait_time.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
97.56 -0.4 97.12 perf-profile.calltrace.cycles-pp.main
1.17 ± 2% +0.2 1.42 ± 6% perf-profile.calltrace.cycles-pp.common_startup_64
97.61 -0.4 97.17 perf-profile.children.cycles-pp.main
0.02 ±141% +0.1 0.07 ± 14% perf-profile.children.cycles-pp.poll_idle
0.00 +0.1 0.06 ± 15% perf-profile.children.cycles-pp.__hrtimer_start_range_ns
0.00 +0.1 0.06 ± 15% perf-profile.children.cycles-pp.dequeue_entity
0.00 +0.1 0.06 ± 11% perf-profile.children.cycles-pp.enqueue_dl_entity
0.00 +0.1 0.06 ± 11% perf-profile.children.cycles-pp.dl_server_start
0.00 +0.1 0.06 ± 17% perf-profile.children.cycles-pp.hrtimer_start_range_ns
0.00 +0.1 0.06 ± 21% perf-profile.children.cycles-pp.pick_next_task_fair
0.00 +0.1 0.06 ± 11% perf-profile.children.cycles-pp.update_load_avg
0.00 +0.1 0.08 ± 17% perf-profile.children.cycles-pp.__pick_next_task
0.00 +0.1 0.10 ± 19% perf-profile.children.cycles-pp.dequeue_entities
0.00 +0.1 0.11 ± 17% perf-profile.children.cycles-pp.dequeue_task_fair
0.00 +0.1 0.11 ± 18% perf-profile.children.cycles-pp.try_to_block_task
0.01 ±223% +0.1 0.14 ± 8% perf-profile.children.cycles-pp.enqueue_task
0.00 +0.1 0.13 ± 8% perf-profile.children.cycles-pp.enqueue_task_fair
0.01 ±223% +0.1 0.14 ± 8% perf-profile.children.cycles-pp.ttwu_do_activate
0.05 ± 7% +0.2 0.20 ± 11% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
0.00 +0.2 0.18 ± 11% perf-profile.children.cycles-pp.try_to_wake_up
0.07 ± 14% +0.2 0.25 ± 18% perf-profile.children.cycles-pp.kthread
0.07 ± 8% +0.2 0.25 ± 18% perf-profile.children.cycles-pp.ret_from_fork
0.07 ± 8% +0.2 0.25 ± 19% perf-profile.children.cycles-pp.ret_from_fork_asm
0.02 ±141% +0.2 0.20 ± 20% perf-profile.children.cycles-pp.schedule
0.00 +0.2 0.19 ± 24% perf-profile.children.cycles-pp.smpboot_thread_fn
0.05 ± 8% +0.2 0.25 ± 21% perf-profile.children.cycles-pp.flush_smp_call_function_queue
1.17 ± 2% +0.2 1.42 ± 6% perf-profile.children.cycles-pp.common_startup_64
1.17 ± 2% +0.2 1.42 ± 6% perf-profile.children.cycles-pp.cpu_startup_entry
1.17 ± 2% +0.2 1.42 ± 6% perf-profile.children.cycles-pp.do_idle
0.09 ± 39% +0.2 0.34 ± 20% perf-profile.children.cycles-pp.__schedule
97.30 -0.4 96.86 perf-profile.self.cycles-pp.main
0.02 ±141% +0.0 0.06 ± 14% perf-profile.self.cycles-pp.poll_idle




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki