Re: [LKP] [lkp] [locks] 7f3697e24d: +35.1% will-it-scale.per_thread_ops

From: Huang\, Ying
Date: Thu Jan 28 2016 - 21:52:58 EST


Jeff Layton <jeff.layton@xxxxxxxxxxxxxxx> writes:

> On Fri, 29 Jan 2016 09:32:19 +0800
> kernel test robot <ying.huang@xxxxxxxxxxxxxxx> wrote:
>
>> FYI, we noticed the below changes on
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>> commit 7f3697e24dc3820b10f445a4a7d914fc356012d1 ("locks: fix unlock when fcntl_setlk races with a close")
>>
>>
>> =========================================================================================
>> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase:
>> gcc-4.9/performance/x86_64-rhel/debian-x86_64-2015-02-07.cgz/lkp-snb01/lock1/will-it-scale
>>
>> commit:
>> 9189922675ecca0fab38931d86b676e9d79602dc
>> 7f3697e24dc3820b10f445a4a7d914fc356012d1
>>
>> 9189922675ecca0f 7f3697e24dc3820b10f445a4a7
>> ---------------- --------------------------
>> %stddev %change %stddev
>> \ | \
>> 2376432 Â 0% +2.1% 2427484 Â 0% will-it-scale.per_process_ops
>> 807889 Â 0% +35.1% 1091496 Â 4% will-it-scale.per_thread_ops
>> 22.08 Â 2% +89.1% 41.75 Â 5% will-it-scale.time.user_time
>> 1238371 Â 14% +100.4% 2481345 Â 39% cpuidle.C1E-SNB.time
>> 3098 Â 57% -66.6% 1035 Â171% numa-numastat.node1.other_node
>> 379.25 Â 8% -21.4% 298.00 Â 12% numa-vmstat.node0.nr_alloc_batch
>> 22.08 Â 2% +89.1% 41.75 Â 5% time.user_time
>> 1795 Â 4% +7.5% 1930 Â 2% vmstat.system.cs
>> 0.54 Â 5% +136.9% 1.28 Â 10% perf-profile.cycles.___might_sleep.__might_sleep.kmem_cache_alloc.locks_alloc_lock.__posix_lock_file
>> 1.65 Â 57% +245.2% 5.70 Â 29% perf-profile.cycles.__fdget_raw.sys_fcntl.entry_SYSCALL_64_fastpath
>> 1.58 Â 59% +248.3% 5.50 Â 31% perf-profile.cycles.__fget.__fget_light.__fdget_raw.sys_fcntl.entry_SYSCALL_64_fastpath
>> 1.62 Â 58% +246.3% 5.63 Â 30% perf-profile.cycles.__fget_light.__fdget_raw.sys_fcntl.entry_SYSCALL_64_fastpath
>> 0.00 Â -1% +Inf% 5.88 Â 11% perf-profile.cycles.__memset.locks_alloc_lock.__posix_lock_file.vfs_lock_file.do_lock_file_wait
>> 2.50 Â 2% -100.0% 0.00 Â -1% perf-profile.cycles.__memset.locks_alloc_lock.__posix_lock_file.vfs_lock_file.fcntl_setlk
>> 1.29 Â 4% +138.8% 3.09 Â 11% perf-profile.cycles.__memset.locks_alloc_lock.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
>> 0.47 Â 9% +144.4% 1.16 Â 11% perf-profile.cycles.__might_fault.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
>> 0.37 Â 12% +140.3% 0.90 Â 9% perf-profile.cycles.__might_sleep.__might_fault.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
>> 0.86 Â 6% +137.7% 2.05 Â 10% perf-profile.cycles.__might_sleep.kmem_cache_alloc.locks_alloc_lock.__posix_lock_file.vfs_lock_file
>> 0.61 Â 14% +56.8% 0.95 Â 14% perf-profile.cycles.__might_sleep.kmem_cache_alloc.locks_alloc_lock.fcntl_setlk.sys_fcntl
>> 0.00 Â -1% +Inf% 39.84 Â 12% perf-profile.cycles.__posix_lock_file.vfs_lock_file.do_lock_file_wait.fcntl_setlk.sys_fcntl
>> 16.44 Â 3% -100.0% 0.00 Â -1% perf-profile.cycles.__posix_lock_file.vfs_lock_file.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
>> 0.00 Â -1% +Inf% 1.77 Â 11% perf-profile.cycles._raw_spin_lock.__posix_lock_file.vfs_lock_file.do_lock_file_wait.fcntl_setlk
>> 59.34 Â 1% -72.4% 16.36 Â 33% perf-profile.cycles._raw_spin_lock.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
>> 0.46 Â 11% +144.9% 1.13 Â 19% perf-profile.cycles.avc_has_perm.inode_has_perm.file_has_perm.selinux_file_fcntl.security_file_fcntl
>> 0.87 Â 6% +103.2% 1.77 Â 12% perf-profile.cycles.avc_has_perm.inode_has_perm.file_has_perm.selinux_file_lock.security_file_lock
>> 0.81 Â 4% +135.7% 1.90 Â 10% perf-profile.cycles.copy_user_generic_string.sys_fcntl.entry_SYSCALL_64_fastpath
>> 0.00 Â -1% +Inf% 41.86 Â 12% perf-profile.cycles.do_lock_file_wait.part.29.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
>> 0.88 Â 6% +127.8% 2.00 Â 9% perf-profile.cycles.entry_SYSCALL_64
>> 0.86 Â 4% +122.6% 1.92 Â 12% perf-profile.cycles.entry_SYSCALL_64_after_swapgs
>> 84.98 Â 0% -9.1% 77.20 Â 2% perf-profile.cycles.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
>> 0.76 Â 10% +142.1% 1.84 Â 14% perf-profile.cycles.file_has_perm.selinux_file_fcntl.security_file_fcntl.sys_fcntl.entry_SYSCALL_64_fastpath
>> 1.35 Â 4% +106.3% 2.78 Â 11% perf-profile.cycles.file_has_perm.selinux_file_lock.security_file_lock.fcntl_setlk.sys_fcntl
>> 0.00 Â -1% +Inf% 0.89 Â 12% perf-profile.cycles.flock_to_posix_lock.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
>> 6.90 Â 4% -48.6% 3.55 Â 27% perf-profile.cycles.fput.entry_SYSCALL_64_fastpath
>> 0.51 Â 10% +140.5% 1.23 Â 16% perf-profile.cycles.inode_has_perm.isra.31.file_has_perm.selinux_file_fcntl.security_file_fcntl.sys_fcntl
>> 0.98 Â 4% +97.7% 1.93 Â 11% perf-profile.cycles.inode_has_perm.isra.31.file_has_perm.selinux_file_lock.security_file_lock.fcntl_setlk
>> 0.00 Â -1% +Inf% 6.56 Â 10% perf-profile.cycles.kmem_cache_alloc.locks_alloc_lock.__posix_lock_file.vfs_lock_file.do_lock_file_wait
>> 2.75 Â 4% -100.0% 0.00 Â -1% perf-profile.cycles.kmem_cache_alloc.locks_alloc_lock.__posix_lock_file.vfs_lock_file.fcntl_setlk
>> 1.53 Â 7% +119.7% 3.37 Â 13% perf-profile.cycles.kmem_cache_alloc.locks_alloc_lock.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
>> 0.00 Â -1% +Inf% 1.79 Â 11% perf-profile.cycles.kmem_cache_free.locks_free_lock.__posix_lock_file.vfs_lock_file.do_lock_file_wait
>> 0.46 Â 14% +257.0% 1.66 Â 11% perf-profile.cycles.kmem_cache_free.locks_free_lock.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
>> 0.40 Â 7% +158.6% 1.05 Â 17% perf-profile.cycles.kmem_cache_free.locks_free_lock.locks_dispose_list.__posix_lock_file.vfs_lock_file
>> 0.00 Â -1% +Inf% 0.96 Â 10% perf-profile.cycles.lg_local_lock.locks_insert_lock_ctx.__posix_lock_file.vfs_lock_file.do_lock_file_wait
>> 0.00 Â -1% +Inf% 14.69 Â 10% perf-profile.cycles.locks_alloc_lock.__posix_lock_file.vfs_lock_file.do_lock_file_wait.fcntl_setlk
>> 6.38 Â 3% -100.0% 0.00 Â -1% perf-profile.cycles.locks_alloc_lock.__posix_lock_file.vfs_lock_file.fcntl_setlk.sys_fcntl
>> 3.28 Â 6% +127.1% 7.45 Â 12% perf-profile.cycles.locks_alloc_lock.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
>> 0.00 Â -1% +Inf% 9.75 Â 13% perf-profile.cycles.locks_delete_lock_ctx.__posix_lock_file.vfs_lock_file.do_lock_file_wait.fcntl_setlk
>> 3.61 Â 1% -100.0% 0.00 Â -1% perf-profile.cycles.locks_delete_lock_ctx.__posix_lock_file.vfs_lock_file.fcntl_setlk.sys_fcntl
>> 0.00 Â -1% +Inf% 1.84 Â 11% perf-profile.cycles.locks_dispose_list.__posix_lock_file.vfs_lock_file.do_lock_file_wait.fcntl_setlk
>> 0.00 Â -1% +Inf% 2.42 Â 10% perf-profile.cycles.locks_free_lock.__posix_lock_file.vfs_lock_file.do_lock_file_wait.fcntl_setlk
>> 1.00 Â 3% -100.0% 0.00 Â -1% perf-profile.cycles.locks_free_lock.__posix_lock_file.vfs_lock_file.fcntl_setlk.sys_fcntl
>> 0.63 Â 11% +224.1% 2.05 Â 10% perf-profile.cycles.locks_free_lock.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
>> 0.00 Â -1% +Inf% 1.22 Â 14% perf-profile.cycles.locks_free_lock.locks_dispose_list.__posix_lock_file.vfs_lock_file.do_lock_file_wait
>> 0.00 Â -1% +Inf% 6.17 Â 15% perf-profile.cycles.locks_insert_lock_ctx.__posix_lock_file.vfs_lock_file.do_lock_file_wait.fcntl_setlk
>> 2.31 Â 6% -100.0% 0.00 Â -1% perf-profile.cycles.locks_insert_lock_ctx.__posix_lock_file.vfs_lock_file.fcntl_setlk.sys_fcntl
>> 0.00 Â -1% +Inf% 8.96 Â 13% perf-profile.cycles.locks_unlink_lock_ctx.locks_delete_lock_ctx.__posix_lock_file.vfs_lock_file.do_lock_file_wait
>> 3.27 Â 1% -100.0% 0.00 Â -1% perf-profile.cycles.locks_unlink_lock_ctx.locks_delete_lock_ctx.__posix_lock_file.vfs_lock_file.fcntl_setlk
>> 53.88 Â 1% -79.7% 10.92 Â 46% perf-profile.cycles.native_queued_spin_lock_slowpath._raw_spin_lock.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
>> 2.75 Â 0% +183.3% 7.79 Â 13% perf-profile.cycles.put_pid.locks_unlink_lock_ctx.locks_delete_lock_ctx.__posix_lock_file.vfs_lock_file
>> 1.11 Â 9% +137.2% 2.63 Â 14% perf-profile.cycles.security_file_fcntl.sys_fcntl.entry_SYSCALL_64_fastpath
>> 1.69 Â 4% +118.2% 3.69 Â 11% perf-profile.cycles.security_file_lock.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
>> 0.91 Â 9% +139.0% 2.17 Â 14% perf-profile.cycles.selinux_file_fcntl.security_file_fcntl.sys_fcntl.entry_SYSCALL_64_fastpath
>> 1.39 Â 4% +114.6% 2.97 Â 10% perf-profile.cycles.selinux_file_lock.security_file_lock.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
>> 0.00 Â -1% +Inf% 41.12 Â 12% perf-profile.cycles.vfs_lock_file.do_lock_file_wait.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
>> 17.04 Â 3% -100.0% 0.00 Â -1% perf-profile.cycles.vfs_lock_file.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
>> 34.75 Â148% +132.4% 80.75 Â 82% sched_debug.cfs_rq:/.load.8
>> 15.00 Â 9% +198.3% 44.75 Â 72% sched_debug.cfs_rq:/.load_avg.21
>> 25.00 Â 29% +574.0% 168.50 Â 78% sched_debug.cfs_rq:/.load_avg.9
>> 38.47 Â 5% +29.1% 49.65 Â 26% sched_debug.cfs_rq:/.load_avg.avg
>> 63.17 Â 10% +44.3% 91.16 Â 36% sched_debug.cfs_rq:/.load_avg.stddev
>> 893865 Â 12% -12.5% 782455 Â 0% sched_debug.cfs_rq:/.min_vruntime.25
>> 18.25 Â 26% +52.1% 27.75 Â 25% sched_debug.cfs_rq:/.runnable_load_avg.9
>> -57635 Â-68% -196.4% 55548 Â130% sched_debug.cfs_rq:/.spread0.1
>> -802264 Â-25% -29.5% -565458 Â-49% sched_debug.cfs_rq:/.spread0.8
>> -804662 Â-25% -29.4% -567811 Â-48% sched_debug.cfs_rq:/.spread0.min
>> 1233 Â 5% +30.9% 1614 Â 28% sched_debug.cfs_rq:/.tg_load_avg.0
>> 1233 Â 5% +30.9% 1614 Â 28% sched_debug.cfs_rq:/.tg_load_avg.1
>> 1228 Â 5% +30.3% 1601 Â 27% sched_debug.cfs_rq:/.tg_load_avg.10
>> 1228 Â 5% +30.4% 1601 Â 27% sched_debug.cfs_rq:/.tg_load_avg.11
>> 1228 Â 5% +30.3% 1601 Â 27% sched_debug.cfs_rq:/.tg_load_avg.12
>> 1229 Â 5% +30.0% 1598 Â 27% sched_debug.cfs_rq:/.tg_load_avg.13
>> 1228 Â 5% +30.1% 1598 Â 27% sched_debug.cfs_rq:/.tg_load_avg.14
>> 1229 Â 5% +30.0% 1598 Â 27% sched_debug.cfs_rq:/.tg_load_avg.15
>> 1226 Â 5% +30.3% 1598 Â 27% sched_debug.cfs_rq:/.tg_load_avg.16
>> 1226 Â 5% +30.2% 1597 Â 27% sched_debug.cfs_rq:/.tg_load_avg.17
>> 1227 Â 5% +30.1% 1595 Â 27% sched_debug.cfs_rq:/.tg_load_avg.18
>> 1227 Â 5% +29.4% 1588 Â 26% sched_debug.cfs_rq:/.tg_load_avg.19
>> 1233 Â 5% +30.4% 1609 Â 27% sched_debug.cfs_rq:/.tg_load_avg.2
>> 1222 Â 5% +29.9% 1587 Â 26% sched_debug.cfs_rq:/.tg_load_avg.20
>> 1223 Â 5% +24.2% 1519 Â 20% sched_debug.cfs_rq:/.tg_load_avg.21
>> 1223 Â 5% +23.8% 1515 Â 20% sched_debug.cfs_rq:/.tg_load_avg.22
>> 1223 Â 5% +23.9% 1515 Â 20% sched_debug.cfs_rq:/.tg_load_avg.23
>> 1223 Â 5% +23.9% 1515 Â 20% sched_debug.cfs_rq:/.tg_load_avg.24
>> 1223 Â 5% +23.5% 1511 Â 19% sched_debug.cfs_rq:/.tg_load_avg.25
>> 1224 Â 5% +23.5% 1512 Â 19% sched_debug.cfs_rq:/.tg_load_avg.26
>> 1223 Â 5% +23.1% 1506 Â 19% sched_debug.cfs_rq:/.tg_load_avg.27
>> 1223 Â 5% +22.5% 1499 Â 19% sched_debug.cfs_rq:/.tg_load_avg.28
>> 1224 Â 5% +22.5% 1499 Â 19% sched_debug.cfs_rq:/.tg_load_avg.29
>> 1233 Â 5% +30.3% 1607 Â 27% sched_debug.cfs_rq:/.tg_load_avg.3
>> 1223 Â 5% +22.2% 1495 Â 18% sched_debug.cfs_rq:/.tg_load_avg.30
>> 1224 Â 5% +22.0% 1493 Â 19% sched_debug.cfs_rq:/.tg_load_avg.31
>> 1234 Â 5% +30.0% 1604 Â 28% sched_debug.cfs_rq:/.tg_load_avg.4
>> 1233 Â 5% +30.0% 1604 Â 28% sched_debug.cfs_rq:/.tg_load_avg.5
>> 1231 Â 5% +30.3% 1604 Â 28% sched_debug.cfs_rq:/.tg_load_avg.6
>> 1233 Â 5% +30.0% 1603 Â 27% sched_debug.cfs_rq:/.tg_load_avg.7
>> 1231 Â 5% +30.1% 1601 Â 27% sched_debug.cfs_rq:/.tg_load_avg.8
>> 1228 Â 5% +30.3% 1601 Â 27% sched_debug.cfs_rq:/.tg_load_avg.9
>> 1228 Â 5% +27.8% 1569 Â 24% sched_debug.cfs_rq:/.tg_load_avg.avg
>> 1246 Â 5% +30.7% 1628 Â 27% sched_debug.cfs_rq:/.tg_load_avg.max
>> 1212 Â 5% +22.2% 1481 Â 19% sched_debug.cfs_rq:/.tg_load_avg.min
>> 15.00 Â 9% +198.3% 44.75 Â 72% sched_debug.cfs_rq:/.tg_load_avg_contrib.21
>> 25.00 Â 29% +574.0% 168.50 Â 78% sched_debug.cfs_rq:/.tg_load_avg_contrib.9
>> 38.53 Â 5% +29.0% 49.71 Â 26% sched_debug.cfs_rq:/.tg_load_avg_contrib.avg
>> 63.34 Â 10% +44.1% 91.30 Â 36% sched_debug.cfs_rq:/.tg_load_avg_contrib.stddev
>> 532.25 Â 2% +8.5% 577.50 Â 6% sched_debug.cfs_rq:/.util_avg.15
>> 210.75 Â 14% -14.4% 180.50 Â 4% sched_debug.cfs_rq:/.util_avg.29
>> 450.00 Â 22% +50.7% 678.00 Â 18% sched_debug.cfs_rq:/.util_avg.9
>> 955572 Â 4% -10.2% 857813 Â 5% sched_debug.cpu.avg_idle.6
>> 23.99 Â 60% -76.2% 5.71 Â 24% sched_debug.cpu.clock.stddev
>> 23.99 Â 60% -76.2% 5.71 Â 24% sched_debug.cpu.clock_task.stddev
>> 2840 Â 37% -47.4% 1492 Â 65% sched_debug.cpu.curr->pid.25
>> 34.75 Â148% +132.4% 80.75 Â 82% sched_debug.cpu.load.8
>> 61776 Â 7% -7.1% 57380 Â 0% sched_debug.cpu.nr_load_updates.25
>> 6543 Â 2% +20.4% 7879 Â 9% sched_debug.cpu.nr_switches.0
>> 5256 Â 23% +177.1% 14566 Â 52% sched_debug.cpu.nr_switches.27
>> 7915 Â 3% +8.7% 8605 Â 3% sched_debug.cpu.nr_switches.avg
>> -0.25 Â-519% +1900.0% -5.00 Â-24% sched_debug.cpu.nr_uninterruptible.12
>> 2.00 Â 93% -125.0% -0.50 Â-300% sched_debug.cpu.nr_uninterruptible.24
>> 17468 Â 14% +194.3% 51413 Â 75% sched_debug.cpu.sched_count.15
>> 2112 Â 2% +20.8% 2552 Â 11% sched_debug.cpu.sched_goidle.0
>> 2103 Â 34% +219.0% 6709 Â 55% sched_debug.cpu.sched_goidle.27
>> 3159 Â 3% +8.2% 3418 Â 4% sched_debug.cpu.sched_goidle.avg
>> 1323 Â 64% -72.7% 361.50 Â 15% sched_debug.cpu.ttwu_count.23
>> 3264 Â 12% +94.4% 6347 Â 41% sched_debug.cpu.ttwu_count.27
>> 3860 Â 3% +9.0% 4208 Â 3% sched_debug.cpu.ttwu_count.avg
>> 2358 Â 3% +28.7% 3035 Â 9% sched_debug.cpu.ttwu_local.0
>> 1110 Â 22% +54.6% 1716 Â 28% sched_debug.cpu.ttwu_local.27
>> 1814 Â 8% +16.1% 2106 Â 5% sched_debug.cpu.ttwu_local.stddev
>>
>>
>> lkp-snb01: Sandy Bridge-EP
>> Memory: 32G
>>
>> will-it-scale.per_thread_ops
>>
>> 1.2e+06 ++---------------------------------------------------------------+
>> | O |
>> 1.15e+06 O+O O O O O O O O |
>> 1.1e+06 ++ |
>> | O O O O O OO |
>> 1.05e+06 ++ O O |
>> 1e+06 ++ |
>> | |
>> 950000 ++ |
>> 900000 ++ |
>> | |
>> 850000 ++ |
>> 800000 *+*.*.*.*.*.*.*.*.*.*.*.*. .*.*. *.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*
>> | * * |
>> 750000 ++---------------------------------------------------------------+
>>
>>
>> will-it-scale.time.user_time
>>
>> 50 ++---------------------------------------------------------------------+
>> | |
>> 45 ++ O O O O O O |
>> O O O O |
>> | O O O O O |
>> 40 ++ O O O O |
>> | |
>> 35 ++ |
>> | |
>> 30 ++ |
>> | |
>> | * |
>> 25 ++ + + |
>> *.*.*.*..*.* *.*.*..*.*.*.*.*.*.*..*.*.*.*.*.*.*..*.*.*.*.*.*..*.*.*.*
>> 20 ++---------------------------------------------------------------------+
>>
>>
>> [*] bisect-good sample
>> [O] bisect-bad sample
>>
>> To reproduce:
>>
>> git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
>> cd lkp-tests
>> bin/lkp install job.yaml # job file is attached in this email
>> bin/lkp run job.yaml
>>
>>
>> Disclaimer:
>> Results have been estimated based on internal Intel analysis and are provided
>> for informational purposes only. Any difference in system hardware or software
>> design or configuration may affect actual performance.
>>
>>
>> Thanks,
>> Ying Huang
>
> Thanks...
>
> Huh...I'm stumped on this one. If anything I would have expected better
> performance with this patch since we don't even take the file_lock or
> do the fcheck in the F_UNLCK codepath now, or when there is an error.
>
> I'll see if I can reproduce it on my own test rig, but I'd welcome
> ideas of where and how this performance regression could have crept in.

This is a performance increase instead of performance regression.

Best Regards,
Huang, Ying