Re: [lkp-developer] [sched/core] 6b94780e45: unixbench.score -4.5% regression

From: Vincent Guittot
Date: Mon Jan 02 2017 - 09:56:48 EST


Hi Xiaolong,

Le Monday 19 Dec 2016 à 08:14:53 (+0800), kernel test robot a écrit :
>
> Greeting,
>
> FYI, we noticed a -4.5% regression of unixbench.score due to commit:

I have been able to restore performance on my platform with the patch below.
Could you test it ?

---
kernel/sched/core.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 393759b..6e7d45c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2578,6 +2578,7 @@ void wake_up_new_task(struct task_struct *p)
__set_task_cpu(p, select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0));
#endif
rq = __task_rq_lock(p, &rf);
+ update_rq_clock(rq);
post_init_entity_util_avg(&p->se);

activate_task(rq, p, 0);
--
2.7.4

Vincent

>
>
> commit: 6b94780e45c17b83e3e75f8aaca5a328db583c74 ("sched/core: Use load_avg for selecting idlest group")
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>
> in testcase: unixbench
> on test machine: 24 threads Nehalem-EP with 24G memory
> with following parameters:
>
> runtime: 300s
> nr_task: 100%
> test: shell1
> cpufreq_governor: performance
>
> test-description: UnixBench is the original BYTE UNIX benchmark suite aims to test performance of Unix-like system.
> test-url: https://github.com/kdlucas/byte-unixbench
>
> In addition to that, the commit also has significant impact on the following tests:
>
> +------------------+-----------------------------------------------------------------------+
> | testcase: change | unixbench: unixbench.score -2.9% regression |
> | test machine | 8 threads Intel(R) Core(TM) i7 CPU 870 @ 2.93GHz with 6G memory |
> | test parameters | nr_task=1 |
> | | runtime=300s |
> | | test=shell8 |
> +------------------+-----------------------------------------------------------------------+
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> To reproduce:
>
> git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
> cd lkp-tests
> bin/lkp install job.yaml # job file is attached in this email
> bin/lkp run job.yaml
>
> testcase/path_params/tbox_group/run: unixbench/300s-100%-shell1-performance/lkp-wsm-ep1
>
> f519a3f1c6b7a990 6b94780e45c17b83e3e75f8aac
> ---------------- --------------------------
> 25565 -5% 24414 unixbench.score
> 29557557 28781098 unixbench.time.voluntary_context_switches
> 5743 -4% 5514 unixbench.time.user_time
> 9.232e+08 -4% 8.831e+08 unixbench.time.minor_page_faults
> 1807 -5% 1709 unixbench.time.percent_of_cpu_this_job_got
> 5656 -7% 5271 unixbench.time.system_time
> 13223805 -20% 10628072 unixbench.time.involuntary_context_switches
> 741766 -62% 279054 interrupts.CAL:Function_call_interrupts
> 31060 -9% 28214 vmstat.system.in
> 126250 -12% 110890 vmstat.system.cs
> 78.58 -6% 74.20 turbostat.%Busy
> 2507 -6% 2366 turbostat.Avg_MHz
> 9134 ± 47% -6e+03 2973 ± 36% latency_stats.max.pipe_read.__vfs_read.vfs_read.SyS_read.entry_SYSCALL_64_fastpath
> 380879 ± 10% 5e+05 887692 ± 49% latency_stats.sum.wait_on_page_bit_killable.__lock_page_or_retry.filemap_fault.__do_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
> 31710 ± 15% -2e+04 10583 ± 14% latency_stats.sum.call_rwsem_down_write_failed.__vma_adjust.__split_vma.do_munmap.vm_munmap.elf_map.load_elf_binary.search_binary_handler.do_execveat_common.SyS_execve.do_syscall_64.return_from_SYSCALL_64
> 51796 ± 4% -4e+04 15457 ± 10% latency_stats.sum.call_rwsem_down_write_failed.unlink_file_vma.free_pgtables.unmap_region.do_munmap.vm_munmap.elf_map.load_elf_binary.search_binary_handler.do_execveat_common.SyS_execve.do_syscall_64
> 111998 ± 18% -7e+04 37074 ± 14% latency_stats.sum.call_rwsem_down_write_failed.__vma_adjust.__split_vma.do_munmap.mmap_region.do_mmap.vm_mmap_pgoff.SyS_mmap_pgoff.SyS_mmap.entry_SYSCALL_64_fastpath
> 275087 ± 15% -2e+05 81973 ± 3% latency_stats.sum.call_rwsem_down_write_failed.unlink_file_vma.free_pgtables.unmap_region.do_munmap.mmap_region.do_mmap.vm_mmap_pgoff.SyS_mmap_pgoff.SyS_mmap.entry_SYSCALL_64_fastpath
> 930993 ± 12% -6e+05 320520 ± 4% latency_stats.sum.call_rwsem_down_write_failed.vma_link.mmap_region.do_mmap.vm_mmap_pgoff.vm_mmap.elf_map.load_elf_binary.search_binary_handler.do_execveat_common.SyS_execve.do_syscall_64
> 4755783 ± 9% -3e+06 1619348 ± 4% latency_stats.sum.call_rwsem_down_write_failed.__vma_adjust.__split_vma.split_vma.mprotect_fixup.do_mprotect_pkey.SyS_mprotect.entry_SYSCALL_64_fastpath
> 5536067 ± 10% -4e+06 1929338 ± 3% latency_stats.sum.call_rwsem_down_write_failed.copy_process._do_fork.SyS_clone.do_syscall_64.return_from_SYSCALL_64
> 9.032e+08 -4% 8.64e+08 perf-stat.page-faults
> 9.032e+08 -4% 8.64e+08 perf-stat.minor-faults
> 2.329e+09 2.269e+09 perf-stat.node-load-misses
> 2.2e+09 -9% 2.011e+09 ± 5% perf-stat.dTLB-store-misses
> 3.278e+10 -9% 2.987e+10 ± 6% perf-stat.dTLB-load-misses
> 19484819 13% 21974129 perf-stat.cpu-migrations
> 3.755e+13 -6% 3.54e+13 perf-stat.cpu-cycles
> 3244 4% 3379 perf-stat.instructions-per-iTLB-miss
> 4.536e+12 -4% 4.332e+12 perf-stat.branch-instructions
> 2.303e+13 -4% 2.208e+13 perf-stat.instructions
> 5.768e+12 -4% 5.517e+12 perf-stat.dTLB-loads
> 3.567e+11 -4% 3.414e+11 perf-stat.cache-references
> 2.97 2.93 perf-stat.branch-miss-rate%
> 2.768e+10 2.699e+10 perf-stat.node-stores
> 5.446e+10 -3% 5.275e+10 perf-stat.cache-misses
> 0.03 -4% 0.03 perf-stat.iTLB-load-miss-rate%
> 9.673e+09 -4% 9.294e+09 perf-stat.node-loads
> 3.596e+12 -4% 3.442e+12 perf-stat.dTLB-stores
> 0.61 0.62 perf-stat.ipc
> 1.347e+11 -6% 1.27e+11 perf-stat.branch-misses
> 7.098e+09 -8% 6.533e+09 perf-stat.iTLB-load-misses
> 2.309e+13 -4% 2.206e+13 perf-stat.iTLB-loads
> 79911173 -12% 70187035 perf-stat.context-switches
>
>
>
> turbostat._Busy
>
> 90 ++-------------------------------------*---*---------------------------+
> | .. *...*.. |
> 80 *+..*..*...*..*...*..*...*..*...O...* O O O O O...O..O...O O O
> 70 O+ O O O O O O O O |
> | |
> 60 ++ |
> 50 ++ |
> | |
> 40 ++ |
> 30 ++ |
> | |
> 20 ++ |
> 10 ++ |
> | |
> 0 ++----------------------------------O----------------------------------+
>
>
>
>
>
> unixbench.time.percent_of_cpu_this_job_got
>
> 2500 ++-------------------------------------------------------------------+
> | |
> | .*... |
> 2000 ++ .*. *..*... |
> *..*...*..*...*..*...*..*...*..O...*. O O O O O..O...O..O O O
> O O O O O O O O O |
> 1500 ++ |
> | |
> 1000 ++ |
> | |
> | |
> 500 ++ |
> | |
> | |
> 0 ++---------------------------------O---------------------------------+
>
>
> vmstat.system.in
>
> 40000 ++------------------------------------------------------------------+
> | .*...*.. |
> 35000 ++ .*...*. |
> 30000 *+.*...*..*...*..*..*...*..*...*..*. *..*...*..* |
> O O O O O O O O O O O O O O O O O O O O
> 25000 ++ |
> | |
> 20000 ++ |
> | |
> 15000 ++ |
> 10000 ++ |
> | |
> 5000 ++ |
> | |
> 0 ++--------------------------------O---------------------------------+
>
> [*] bisect-good sample
> [O] bisect-bad sample
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> Thanks,
> Xiaolong