Re: [fs] 936e92b615: unixbench.score 32.3% improvement

From: Shaokun Zhang
Date: Mon Jul 13 2020 - 04:49:43 EST


Hi maintainers,

This issue is debugged on Huawei Kunpeng 920 which is an ARM64 platform and we also do more tests
on x86 platform.
Since Rong has also reported the improvement on x86ïit seems necessary for us to do it.
Any comments on it?

Thanks,
Shaokun

å 2020/7/8 15:23, kernel test robot åé:
> Greeting,
>
> FYI, we noticed a 32.3% improvement of unixbench.score due to commit:
>
>
> commit: 936e92b615e212d08eb74951324bef25ba564c34 ("[PATCH RESEND] fs: Move @f_count to different cacheline with @f_mode")
> url: https://github.com/0day-ci/linux/commits/Shaokun-Zhang/fs-Move-f_count-to-different-cacheline-with-f_mode/20200624-163511
> base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 5e857ce6eae7ca21b2055cca4885545e29228fe2
>
> in testcase: unixbench
> on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
> with following parameters:
>
> runtime: 300s
> nr_task: 30%
> test: syscall
> cpufreq_governor: performance
> ucode: 0x5002f01
>
> test-description: UnixBench is the original BYTE UNIX benchmark suite aims to test performance of Unix-like system.
> test-url: https://github.com/kdlucas/byte-unixbench
>
>
>
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> bin/lkp install job.yaml # job file is attached in this email
> bin/lkp run job.yaml
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase/ucode:
> gcc-9/performance/x86_64-rhel-7.6/30%/debian-x86_64-20191114.cgz/300s/lkp-csl-2ap3/syscall/unixbench/0x5002f01
>
> commit:
> 5e857ce6ea ("Merge branch 'hch' (maccess patches from Christoph Hellwig)")
> 936e92b615 ("fs: Move @f_count to different cacheline with @f_mode")
>
> 5e857ce6eae7ca21 936e92b615e212d08eb74951324
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 2297 Â 2% +32.3% 3038 unixbench.score
> 171.74 +34.8% 231.55 unixbench.time.user_time
> 1.366e+09 +32.6% 1.812e+09 unixbench.workload
> 26472 Â 6% +1270.0% 362665 Â158% cpuidle.C1.usage
> 0.25 Â 2% +0.1 0.33 mpstat.cpu.all.usr%
> 8.32 Â 43% +129.7% 19.12 Â 63% sched_debug.cpu.clock.stddev
> 8.32 Â 43% +129.7% 19.12 Â 63% sched_debug.cpu.clock_task.stddev
> 2100 Â 2% -15.6% 1772 Â 9% sched_debug.cpu.nr_switches.min
> 373.34 Â 3% +12.4% 419.48 Â 6% sched_debug.cpu.ttwu_local.stddev
> 2740 Â 12% -72.3% 757.75 Â105% numa-vmstat.node0.nr_inactive_anon
> 3139 Â 8% -69.9% 946.25 Â 97% numa-vmstat.node0.nr_shmem
> 2740 Â 12% -72.3% 757.75 Â105% numa-vmstat.node0.nr_zone_inactive_anon
> 373.75 Â 51% +443.3% 2030 Â 26% numa-vmstat.node2.nr_inactive_anon
> 496.00 Â 19% +366.1% 2311 Â 29% numa-vmstat.node2.nr_shmem
> 373.75 Â 51% +443.3% 2030 Â 26% numa-vmstat.node2.nr_zone_inactive_anon
> 13728 Â 13% +148.1% 34056 Â 46% numa-vmstat.node3.nr_active_anon
> 78558 +11.3% 87431 Â 6% numa-vmstat.node3.nr_file_pages
> 9939 Â 8% +19.7% 11902 Â 13% numa-vmstat.node3.nr_shmem
> 13728 Â 13% +148.1% 34056 Â 46% numa-vmstat.node3.nr_zone_active_anon
> 11103 Â 13% -71.2% 3201 Â 99% numa-meminfo.node0.Inactive
> 10962 Â 12% -72.3% 3032 Â105% numa-meminfo.node0.Inactive(anon)
> 8551 Â 31% -29.4% 6034 Â 18% numa-meminfo.node0.Mapped
> 12560 Â 8% -69.9% 3786 Â 97% numa-meminfo.node0.Shmem
> 1596 Â 51% +415.6% 8230 Â 24% numa-meminfo.node2.Inactive
> 1496 Â 51% +442.8% 8122 Â 26% numa-meminfo.node2.Inactive(anon)
> 1984 Â 19% +366.1% 9248 Â 29% numa-meminfo.node2.Shmem
> 54929 Â 13% +148.0% 136212 Â 46% numa-meminfo.node3.Active
> 54929 Â 13% +148.0% 136206 Â 46% numa-meminfo.node3.Active(anon)
> 314216 +11.3% 349697 Â 6% numa-meminfo.node3.FilePages
> 747907 Â 2% +15.2% 861672 Â 9% numa-meminfo.node3.MemUsed
> 39744 Â 8% +19.7% 47580 Â 13% numa-meminfo.node3.Shmem
> 13.94 Â 6% -13.9 0.00 perf-profile.calltrace.cycles-pp.dnotify_flush.filp_close.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.00 +0.7 0.66 Â 8% perf-profile.calltrace.cycles-pp.__x64_sys_umask.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 31.64 Â 8% +3.4 35.08 Â 5% perf-profile.calltrace.cycles-pp.__fget_files.ksys_dup.__x64_sys_dup.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 6.82 Â 8% +5.6 12.41 Â 12% perf-profile.calltrace.cycles-pp.fput_many.filp_close.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 23.54 Â 58% +12.7 36.27 Â 5% perf-profile.calltrace.cycles-pp.ksys_dup.__x64_sys_dup.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 23.54 Â 58% +12.7 36.29 Â 5% perf-profile.calltrace.cycles-pp.__x64_sys_dup.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 13.98 Â 6% -14.0 0.00 perf-profile.children.cycles-pp.dnotify_flush
> 39.81 Â 6% -10.8 28.96 Â 9% perf-profile.children.cycles-pp.filp_close
> 40.13 Â 6% -10.7 29.44 Â 9% perf-profile.children.cycles-pp.__x64_sys_close
> 0.15 Â 10% -0.0 0.13 Â 8% perf-profile.children.cycles-pp.scheduler_tick
> 0.05 Â 8% +0.0 0.07 Â 6% perf-profile.children.cycles-pp.__x64_sys_getuid
> 0.10 Â 7% +0.0 0.12 Â 8% perf-profile.children.cycles-pp.__prepare_exit_to_usermode
> 0.44 Â 7% +0.1 0.56 Â 6% perf-profile.children.cycles-pp.syscall_return_via_sysret
> 31.78 Â 8% +3.4 35.22 Â 5% perf-profile.children.cycles-pp.__fget_files
> 32.52 Â 8% +3.7 36.27 Â 5% perf-profile.children.cycles-pp.ksys_dup
> 32.54 Â 8% +3.8 36.30 Â 5% perf-profile.children.cycles-pp.__x64_sys_dup
> 6.86 Â 7% +5.6 12.45 Â 12% perf-profile.children.cycles-pp.fput_many
> 13.91 Â 6% -13.9 0.00 perf-profile.self.cycles-pp.dnotify_flush
> 18.05 Â 5% -1.6 16.41 Â 7% perf-profile.self.cycles-pp.filp_close
> 0.06 Â 6% +0.0 0.08 Â 8% perf-profile.self.cycles-pp.__prepare_exit_to_usermode
> 0.09 Â 9% +0.0 0.11 Â 7% perf-profile.self.cycles-pp.do_syscall_64
> 0.16 Â 9% +0.0 0.20 Â 4% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
> 0.30 Â 8% +0.1 0.36 Â 7% perf-profile.self.cycles-pp.entry_SYSCALL_64
> 0.44 Â 7% +0.1 0.56 Â 6% perf-profile.self.cycles-pp.syscall_return_via_sysret
> 31.61 Â 8% +3.4 35.00 Â 5% perf-profile.self.cycles-pp.__fget_files
> 6.81 Â 7% +5.6 12.38 Â 12% perf-profile.self.cycles-pp.fput_many
> 36623 Â 3% +11.5% 40822 Â 7% softirqs.CPU100.SCHED
> 16499 Â 40% +27.8% 21088 Â 35% softirqs.CPU122.RCU
> 16758 Â 41% +30.0% 21781 Â 35% softirqs.CPU126.RCU
> 178.25 Â 11% +7718.2% 13936 Â168% softirqs.CPU13.NET_RX
> 40883 Â 4% -6.9% 38055 Â 2% softirqs.CPU132.SCHED
> 16029 Â 41% +35.9% 21789 Â 33% softirqs.CPU144.RCU
> 16220 Â 43% +32.4% 21484 Â 35% softirqs.CPU145.RCU
> 16393 Â 39% +29.9% 21301 Â 32% softirqs.CPU146.RCU
> 16217 Â 39% +29.8% 21055 Â 35% softirqs.CPU147.RCU
> 37011 Â 12% +12.4% 41589 Â 5% softirqs.CPU149.SCHED
> 16127 Â 41% +34.5% 21685 Â 34% softirqs.CPU150.RCU
> 16131 Â 41% +32.3% 21333 Â 35% softirqs.CPU151.RCU
> 16558 Â 37% +28.2% 21230 Â 34% softirqs.CPU152.RCU
> 15863 Â 40% +34.1% 21266 Â 32% softirqs.CPU153.RCU
> 16044 Â 41% +32.7% 21286 Â 34% softirqs.CPU154.RCU
> 16057 Â 40% +34.9% 21658 Â 33% softirqs.CPU155.RCU
> 16352 Â 39% +31.0% 21423 Â 33% softirqs.CPU156.RCU
> 16006 Â 39% +33.4% 21348 Â 32% softirqs.CPU158.RCU
> 16300 Â 41% +32.0% 21521 Â 34% softirqs.CPU161.RCU
> 37546 Â 4% +13.5% 42605 Â 3% softirqs.CPU161.SCHED
> 16411 Â 41% +33.4% 21894 Â 33% softirqs.CPU162.RCU
> 16329 Â 41% +32.9% 21704 Â 35% softirqs.CPU163.RCU
> 16517 Â 39% +29.8% 21441 Â 34% softirqs.CPU164.RCU
> 16227 Â 41% +32.3% 21471 Â 34% softirqs.CPU165.RCU
> 16347 Â 40% +31.4% 21481 Â 35% softirqs.CPU166.RCU
> 16360 Â 43% +32.2% 21631 Â 35% softirqs.CPU167.RCU
> 36986 +11.3% 41148 Â 6% softirqs.CPU167.SCHED
> 16218 Â 44% +34.7% 21843 Â 33% softirqs.CPU189.RCU
> 16501 Â 39% +32.0% 21783 Â 33% softirqs.CPU52.RCU
> 17101 Â 41% +29.4% 22121 Â 35% softirqs.CPU68.RCU
> 1.087e+09 +20.9% 1.314e+09 perf-stat.i.branch-instructions
> 19778787 +22.1% 24144895 Â 16% perf-stat.i.branch-misses
> 22.88 -17.7% 18.84 Â 2% perf-stat.i.cpi
> 1.635e+09 +23.6% 2.021e+09 perf-stat.i.dTLB-loads
> 20648 Â 2% +218.4% 65736 Â110% perf-stat.i.dTLB-store-misses
> 1.023e+09 +24.8% 1.276e+09 perf-stat.i.dTLB-stores
> 78.10 +1.4 79.54 perf-stat.i.iTLB-load-miss-rate%
> 16169669 +8.2% 17493234 perf-stat.i.iTLB-load-misses
> 5.364e+09 +21.3% 6.507e+09 perf-stat.i.instructions
> 369.33 +11.8% 413.03 Â 5% perf-stat.i.instructions-per-iTLB-miss
> 0.41 Â 2% +83.3% 0.76 Â 16% perf-stat.i.metric.K/sec
> 19.79 +23.2% 24.39 perf-stat.i.metric.M/sec
> 4460149 Â 2% -45.1% 2447884 Â 14% perf-stat.i.node-load-misses
> 241219 Â 2% -58.8% 99443 Â 47% perf-stat.i.node-loads
> 1679821 Â 2% -4.4% 1605611 Â 3% perf-stat.i.node-store-misses
> 25.91 -17.6% 21.36 perf-stat.overall.cpi
> 82.51 +1.7 84.17 perf-stat.overall.iTLB-load-miss-rate%
> 331.21 +12.2% 371.62 perf-stat.overall.instructions-per-iTLB-miss
> 0.04 +21.3% 0.05 perf-stat.overall.ipc
> 1566 -8.4% 1435 perf-stat.overall.path-length
> 1.089e+09 +21.0% 1.318e+09 perf-stat.ps.branch-instructions
> 19801099 +21.7% 24102537 Â 15% perf-stat.ps.branch-misses
> 1.641e+09 +23.6% 2.028e+09 perf-stat.ps.dTLB-loads
> 20512 Â 2% +212.7% 64142 Â109% perf-stat.ps.dTLB-store-misses
> 1.027e+09 +24.8% 1.282e+09 perf-stat.ps.dTLB-stores
> 16239916 +8.2% 17567773 perf-stat.ps.iTLB-load-misses
> 5.378e+09 +21.4% 6.527e+09 perf-stat.ps.instructions
> 4485062 Â 2% -45.2% 2458026 Â 14% perf-stat.ps.node-load-misses
> 242388 Â 2% -59.0% 99493 Â 47% perf-stat.ps.node-loads
> 1689890 Â 2% -4.5% 1614182 Â 3% perf-stat.ps.node-store-misses
> 2.139e+12 +21.5% 2.6e+12 perf-stat.total.instructions
> 288.00 Â 13% +8910.9% 25951 Â168% interrupts.34:PCI-MSI.524292-edge.eth0-TxRx-3
> 2042 Â 57% +190.2% 5927 Â 26% interrupts.CPU1.NMI:Non-maskable_interrupts
> 2042 Â 57% +190.2% 5927 Â 26% interrupts.CPU1.PMI:Performance_monitoring_interrupts
> 3.75 Â 34% +2373.3% 92.75 Â130% interrupts.CPU100.TLB:TLB_shootdowns
> 3510 Â 88% -85.1% 522.00 Â124% interrupts.CPU107.NMI:Non-maskable_interrupts
> 3510 Â 88% -85.1% 522.00 Â124% interrupts.CPU107.PMI:Performance_monitoring_interrupts
> 3813 Â 74% -73.3% 1018 Â150% interrupts.CPU110.NMI:Non-maskable_interrupts
> 3813 Â 74% -73.3% 1018 Â150% interrupts.CPU110.PMI:Performance_monitoring_interrupts
> 4536 Â 51% -97.1% 131.50 Â 8% interrupts.CPU111.NMI:Non-maskable_interrupts
> 4536 Â 51% -97.1% 131.50 Â 8% interrupts.CPU111.PMI:Performance_monitoring_interrupts
> 4476 Â 47% -97.5% 113.00 Â 19% interrupts.CPU112.NMI:Non-maskable_interrupts
> 4476 Â 47% -97.5% 113.00 Â 19% interrupts.CPU112.PMI:Performance_monitoring_interrupts
> 3522 Â 36% +92.7% 6787 Â 16% interrupts.CPU120.NMI:Non-maskable_interrupts
> 3522 Â 36% +92.7% 6787 Â 16% interrupts.CPU120.PMI:Performance_monitoring_interrupts
> 2888 Â 66% +117.5% 6283 Â 21% interrupts.CPU123.NMI:Non-maskable_interrupts
> 2888 Â 66% +117.5% 6283 Â 21% interrupts.CPU123.PMI:Performance_monitoring_interrupts
> 3109 Â 61% +132.5% 7230 Â 7% interrupts.CPU124.NMI:Non-maskable_interrupts
> 3109 Â 61% +132.5% 7230 Â 7% interrupts.CPU124.PMI:Performance_monitoring_interrupts
> 1067 Â 19% -21.6% 836.50 interrupts.CPU125.CAL:Function_call_interrupts
> 288.00 Â 13% +8910.9% 25951 Â168% interrupts.CPU13.34:PCI-MSI.524292-edge.eth0-TxRx-3
> 244.25 Â 96% -95.3% 11.50 Â 95% interrupts.CPU13.TLB:TLB_shootdowns
> 2056 Â117% +206.3% 6298 Â 20% interrupts.CPU130.NMI:Non-maskable_interrupts
> 2056 Â117% +206.3% 6298 Â 20% interrupts.CPU130.PMI:Performance_monitoring_interrupts
> 831.50 +21.4% 1009 Â 13% interrupts.CPU133.CAL:Function_call_interrupts
> 8.00 Â 29% +634.4% 58.75 Â119% interrupts.CPU133.RES:Rescheduling_interrupts
> 1629 Â159% +265.3% 5952 Â 29% interrupts.CPU139.NMI:Non-maskable_interrupts
> 1629 Â159% +265.3% 5952 Â 29% interrupts.CPU139.PMI:Performance_monitoring_interrupts
> 1660 Â159% +161.0% 4332 Â 61% interrupts.CPU141.NMI:Non-maskable_interrupts
> 1660 Â159% +161.0% 4332 Â 61% interrupts.CPU141.PMI:Performance_monitoring_interrupts
> 882.75 Â147% +542.5% 5671 Â 38% interrupts.CPU143.NMI:Non-maskable_interrupts
> 882.75 Â147% +542.5% 5671 Â 38% interrupts.CPU143.PMI:Performance_monitoring_interrupts
> 2600 Â 29% +68.8% 4389 Â 47% interrupts.CPU144.NMI:Non-maskable_interrupts
> 2600 Â 29% +68.8% 4389 Â 47% interrupts.CPU144.PMI:Performance_monitoring_interrupts
> 1494 Â 20% +91.3% 2859 Â 29% interrupts.CPU147.NMI:Non-maskable_interrupts
> 1494 Â 20% +91.3% 2859 Â 29% interrupts.CPU147.PMI:Performance_monitoring_interrupts
> 3657 Â 54% -96.3% 133.75 Â 8% interrupts.CPU15.NMI:Non-maskable_interrupts
> 3657 Â 54% -96.3% 133.75 Â 8% interrupts.CPU15.PMI:Performance_monitoring_interrupts
> 5165 Â 40% -97.8% 115.00 Â 26% interrupts.CPU16.NMI:Non-maskable_interrupts
> 5165 Â 40% -97.8% 115.00 Â 26% interrupts.CPU16.PMI:Performance_monitoring_interrupts
> 34.00 Â125% -84.6% 5.25 Â 49% interrupts.CPU186.RES:Rescheduling_interrupts
> 1033 Â 24% -19.0% 836.75 interrupts.CPU190.CAL:Function_call_interrupts
> 68.00 Â 28% +55.5% 105.75 Â 9% interrupts.CPU26.RES:Rescheduling_interrupts
> 882.25 Â 4% +6.3% 937.75 Â 7% interrupts.CPU32.CAL:Function_call_interrupts
> 139.25 Â 96% -74.0% 36.25 Â 72% interrupts.CPU32.TLB:TLB_shootdowns
> 848.25 Â130% +368.9% 3977 Â 56% interrupts.CPU35.NMI:Non-maskable_interrupts
> 848.25 Â130% +368.9% 3977 Â 56% interrupts.CPU35.PMI:Performance_monitoring_interrupts
> 958.25 Â 11% -10.6% 856.75 interrupts.CPU36.CAL:Function_call_interrupts
> 1903 Â 72% +127.9% 4337 Â 23% interrupts.CPU41.NMI:Non-maskable_interrupts
> 1903 Â 72% +127.9% 4337 Â 23% interrupts.CPU41.PMI:Performance_monitoring_interrupts
> 1320 Â158% +245.4% 4560 Â 32% interrupts.CPU47.NMI:Non-maskable_interrupts
> 1320 Â158% +245.4% 4560 Â 32% interrupts.CPU47.PMI:Performance_monitoring_interrupts
> 837.50 +5.2% 881.25 Â 4% interrupts.CPU61.CAL:Function_call_interrupts
> 1074 Â 28% -22.1% 836.50 interrupts.CPU69.CAL:Function_call_interrupts
> 1042 Â 12% -18.7% 847.50 Â 2% interrupts.CPU86.CAL:Function_call_interrupts
>
>
>
> unixbench.score
>
> 3200 +--------------------------------------------------------------------+
> | O O O |
> 3000 |-+ O O O O O O O O O |
> | O O O O |
> | O |
> 2800 |-+ |
> | |
> 2600 |-+ |
> | |
> 2400 |-+ |
> | +.+.. .+.+..+. +..+. .+. .+. .+..+.+.+..+.+.+. .+.|
> |.+.. + .+ +.+..+. + + +. + +. |
> 2200 |-+ + + + |
> | |
> 2000 +--------------------------------------------------------------------+
>
>
> unixbench.workload
>
> 1.9e+09 +-----------------------------------------------------------------+
> | O O O O |
> 1.8e+09 |-+ O O O O O O O O |
> | O O O O O |
> 1.7e+09 |-+ |
> | |
> 1.6e+09 |-+ |
> | |
> 1.5e+09 |-+ |
> | |
> 1.4e+09 |-+ +.+ .+..+.+ +.+. .+.. .+. .+..+. .+. .+.. .|
> |.+. .. : + + .+.+.. + + + +.+ + + +.+ |
> 1.3e+09 |-+ + : + + + |
> | + |
> 1.2e+09 +-----------------------------------------------------------------+
>
>
> [*] bisect-good sample
> [O] bisect-bad sample
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> Thanks,
> Rong Chen
>