Re: [lkp] [locks] 7f3697e24d: +35.1% will-it-scale.per_thread_ops

From: Jeff Layton
Date: Thu Jan 28 2016 - 21:38:48 EST


On Fri, 29 Jan 2016 09:32:19 +0800
kernel test robot <ying.huang@xxxxxxxxxxxxxxx> wrote:

> FYI, we noticed the below changes on
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> commit 7f3697e24dc3820b10f445a4a7d914fc356012d1 ("locks: fix unlock when fcntl_setlk races with a close")
>
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase:
> gcc-4.9/performance/x86_64-rhel/debian-x86_64-2015-02-07.cgz/lkp-snb01/lock1/will-it-scale
>
> commit:
> 9189922675ecca0fab38931d86b676e9d79602dc
> 7f3697e24dc3820b10f445a4a7d914fc356012d1
>
> 9189922675ecca0f 7f3697e24dc3820b10f445a4a7
> ---------------- --------------------------
> %stddev %change %stddev
> \ | \
> 2376432 Â 0% +2.1% 2427484 Â 0% will-it-scale.per_process_ops
> 807889 Â 0% +35.1% 1091496 Â 4% will-it-scale.per_thread_ops
> 22.08 Â 2% +89.1% 41.75 Â 5% will-it-scale.time.user_time
> 1238371 Â 14% +100.4% 2481345 Â 39% cpuidle.C1E-SNB.time
> 3098 Â 57% -66.6% 1035 Â171% numa-numastat.node1.other_node
> 379.25 Â 8% -21.4% 298.00 Â 12% numa-vmstat.node0.nr_alloc_batch
> 22.08 Â 2% +89.1% 41.75 Â 5% time.user_time
> 1795 Â 4% +7.5% 1930 Â 2% vmstat.system.cs
> 0.54 Â 5% +136.9% 1.28 Â 10% perf-profile.cycles.___might_sleep.__might_sleep.kmem_cache_alloc.locks_alloc_lock.__posix_lock_file
> 1.65 Â 57% +245.2% 5.70 Â 29% perf-profile.cycles.__fdget_raw.sys_fcntl.entry_SYSCALL_64_fastpath
> 1.58 Â 59% +248.3% 5.50 Â 31% perf-profile.cycles.__fget.__fget_light.__fdget_raw.sys_fcntl.entry_SYSCALL_64_fastpath
> 1.62 Â 58% +246.3% 5.63 Â 30% perf-profile.cycles.__fget_light.__fdget_raw.sys_fcntl.entry_SYSCALL_64_fastpath
> 0.00 Â -1% +Inf% 5.88 Â 11% perf-profile.cycles.__memset.locks_alloc_lock.__posix_lock_file.vfs_lock_file.do_lock_file_wait
> 2.50 Â 2% -100.0% 0.00 Â -1% perf-profile.cycles.__memset.locks_alloc_lock.__posix_lock_file.vfs_lock_file.fcntl_setlk
> 1.29 Â 4% +138.8% 3.09 Â 11% perf-profile.cycles.__memset.locks_alloc_lock.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
> 0.47 Â 9% +144.4% 1.16 Â 11% perf-profile.cycles.__might_fault.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
> 0.37 Â 12% +140.3% 0.90 Â 9% perf-profile.cycles.__might_sleep.__might_fault.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
> 0.86 Â 6% +137.7% 2.05 Â 10% perf-profile.cycles.__might_sleep.kmem_cache_alloc.locks_alloc_lock.__posix_lock_file.vfs_lock_file
> 0.61 Â 14% +56.8% 0.95 Â 14% perf-profile.cycles.__might_sleep.kmem_cache_alloc.locks_alloc_lock.fcntl_setlk.sys_fcntl
> 0.00 Â -1% +Inf% 39.84 Â 12% perf-profile.cycles.__posix_lock_file.vfs_lock_file.do_lock_file_wait.fcntl_setlk.sys_fcntl
> 16.44 Â 3% -100.0% 0.00 Â -1% perf-profile.cycles.__posix_lock_file.vfs_lock_file.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
> 0.00 Â -1% +Inf% 1.77 Â 11% perf-profile.cycles._raw_spin_lock.__posix_lock_file.vfs_lock_file.do_lock_file_wait.fcntl_setlk
> 59.34 Â 1% -72.4% 16.36 Â 33% perf-profile.cycles._raw_spin_lock.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
> 0.46 Â 11% +144.9% 1.13 Â 19% perf-profile.cycles.avc_has_perm.inode_has_perm.file_has_perm.selinux_file_fcntl.security_file_fcntl
> 0.87 Â 6% +103.2% 1.77 Â 12% perf-profile.cycles.avc_has_perm.inode_has_perm.file_has_perm.selinux_file_lock.security_file_lock
> 0.81 Â 4% +135.7% 1.90 Â 10% perf-profile.cycles.copy_user_generic_string.sys_fcntl.entry_SYSCALL_64_fastpath
> 0.00 Â -1% +Inf% 41.86 Â 12% perf-profile.cycles.do_lock_file_wait.part.29.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
> 0.88 Â 6% +127.8% 2.00 Â 9% perf-profile.cycles.entry_SYSCALL_64
> 0.86 Â 4% +122.6% 1.92 Â 12% perf-profile.cycles.entry_SYSCALL_64_after_swapgs
> 84.98 Â 0% -9.1% 77.20 Â 2% perf-profile.cycles.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
> 0.76 Â 10% +142.1% 1.84 Â 14% perf-profile.cycles.file_has_perm.selinux_file_fcntl.security_file_fcntl.sys_fcntl.entry_SYSCALL_64_fastpath
> 1.35 Â 4% +106.3% 2.78 Â 11% perf-profile.cycles.file_has_perm.selinux_file_lock.security_file_lock.fcntl_setlk.sys_fcntl
> 0.00 Â -1% +Inf% 0.89 Â 12% perf-profile.cycles.flock_to_posix_lock.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
> 6.90 Â 4% -48.6% 3.55 Â 27% perf-profile.cycles.fput.entry_SYSCALL_64_fastpath
> 0.51 Â 10% +140.5% 1.23 Â 16% perf-profile.cycles.inode_has_perm.isra.31.file_has_perm.selinux_file_fcntl.security_file_fcntl.sys_fcntl
> 0.98 Â 4% +97.7% 1.93 Â 11% perf-profile.cycles.inode_has_perm.isra.31.file_has_perm.selinux_file_lock.security_file_lock.fcntl_setlk
> 0.00 Â -1% +Inf% 6.56 Â 10% perf-profile.cycles.kmem_cache_alloc.locks_alloc_lock.__posix_lock_file.vfs_lock_file.do_lock_file_wait
> 2.75 Â 4% -100.0% 0.00 Â -1% perf-profile.cycles.kmem_cache_alloc.locks_alloc_lock.__posix_lock_file.vfs_lock_file.fcntl_setlk
> 1.53 Â 7% +119.7% 3.37 Â 13% perf-profile.cycles.kmem_cache_alloc.locks_alloc_lock.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
> 0.00 Â -1% +Inf% 1.79 Â 11% perf-profile.cycles.kmem_cache_free.locks_free_lock.__posix_lock_file.vfs_lock_file.do_lock_file_wait
> 0.46 Â 14% +257.0% 1.66 Â 11% perf-profile.cycles.kmem_cache_free.locks_free_lock.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
> 0.40 Â 7% +158.6% 1.05 Â 17% perf-profile.cycles.kmem_cache_free.locks_free_lock.locks_dispose_list.__posix_lock_file.vfs_lock_file
> 0.00 Â -1% +Inf% 0.96 Â 10% perf-profile.cycles.lg_local_lock.locks_insert_lock_ctx.__posix_lock_file.vfs_lock_file.do_lock_file_wait
> 0.00 Â -1% +Inf% 14.69 Â 10% perf-profile.cycles.locks_alloc_lock.__posix_lock_file.vfs_lock_file.do_lock_file_wait.fcntl_setlk
> 6.38 Â 3% -100.0% 0.00 Â -1% perf-profile.cycles.locks_alloc_lock.__posix_lock_file.vfs_lock_file.fcntl_setlk.sys_fcntl
> 3.28 Â 6% +127.1% 7.45 Â 12% perf-profile.cycles.locks_alloc_lock.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
> 0.00 Â -1% +Inf% 9.75 Â 13% perf-profile.cycles.locks_delete_lock_ctx.__posix_lock_file.vfs_lock_file.do_lock_file_wait.fcntl_setlk
> 3.61 Â 1% -100.0% 0.00 Â -1% perf-profile.cycles.locks_delete_lock_ctx.__posix_lock_file.vfs_lock_file.fcntl_setlk.sys_fcntl
> 0.00 Â -1% +Inf% 1.84 Â 11% perf-profile.cycles.locks_dispose_list.__posix_lock_file.vfs_lock_file.do_lock_file_wait.fcntl_setlk
> 0.00 Â -1% +Inf% 2.42 Â 10% perf-profile.cycles.locks_free_lock.__posix_lock_file.vfs_lock_file.do_lock_file_wait.fcntl_setlk
> 1.00 Â 3% -100.0% 0.00 Â -1% perf-profile.cycles.locks_free_lock.__posix_lock_file.vfs_lock_file.fcntl_setlk.sys_fcntl
> 0.63 Â 11% +224.1% 2.05 Â 10% perf-profile.cycles.locks_free_lock.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
> 0.00 Â -1% +Inf% 1.22 Â 14% perf-profile.cycles.locks_free_lock.locks_dispose_list.__posix_lock_file.vfs_lock_file.do_lock_file_wait
> 0.00 Â -1% +Inf% 6.17 Â 15% perf-profile.cycles.locks_insert_lock_ctx.__posix_lock_file.vfs_lock_file.do_lock_file_wait.fcntl_setlk
> 2.31 Â 6% -100.0% 0.00 Â -1% perf-profile.cycles.locks_insert_lock_ctx.__posix_lock_file.vfs_lock_file.fcntl_setlk.sys_fcntl
> 0.00 Â -1% +Inf% 8.96 Â 13% perf-profile.cycles.locks_unlink_lock_ctx.locks_delete_lock_ctx.__posix_lock_file.vfs_lock_file.do_lock_file_wait
> 3.27 Â 1% -100.0% 0.00 Â -1% perf-profile.cycles.locks_unlink_lock_ctx.locks_delete_lock_ctx.__posix_lock_file.vfs_lock_file.fcntl_setlk
> 53.88 Â 1% -79.7% 10.92 Â 46% perf-profile.cycles.native_queued_spin_lock_slowpath._raw_spin_lock.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
> 2.75 Â 0% +183.3% 7.79 Â 13% perf-profile.cycles.put_pid.locks_unlink_lock_ctx.locks_delete_lock_ctx.__posix_lock_file.vfs_lock_file
> 1.11 Â 9% +137.2% 2.63 Â 14% perf-profile.cycles.security_file_fcntl.sys_fcntl.entry_SYSCALL_64_fastpath
> 1.69 Â 4% +118.2% 3.69 Â 11% perf-profile.cycles.security_file_lock.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
> 0.91 Â 9% +139.0% 2.17 Â 14% perf-profile.cycles.selinux_file_fcntl.security_file_fcntl.sys_fcntl.entry_SYSCALL_64_fastpath
> 1.39 Â 4% +114.6% 2.97 Â 10% perf-profile.cycles.selinux_file_lock.security_file_lock.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
> 0.00 Â -1% +Inf% 41.12 Â 12% perf-profile.cycles.vfs_lock_file.do_lock_file_wait.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
> 17.04 Â 3% -100.0% 0.00 Â -1% perf-profile.cycles.vfs_lock_file.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
> 34.75 Â148% +132.4% 80.75 Â 82% sched_debug.cfs_rq:/.load.8
> 15.00 Â 9% +198.3% 44.75 Â 72% sched_debug.cfs_rq:/.load_avg.21
> 25.00 Â 29% +574.0% 168.50 Â 78% sched_debug.cfs_rq:/.load_avg.9
> 38.47 Â 5% +29.1% 49.65 Â 26% sched_debug.cfs_rq:/.load_avg.avg
> 63.17 Â 10% +44.3% 91.16 Â 36% sched_debug.cfs_rq:/.load_avg.stddev
> 893865 Â 12% -12.5% 782455 Â 0% sched_debug.cfs_rq:/.min_vruntime.25
> 18.25 Â 26% +52.1% 27.75 Â 25% sched_debug.cfs_rq:/.runnable_load_avg.9
> -57635 Â-68% -196.4% 55548 Â130% sched_debug.cfs_rq:/.spread0.1
> -802264 Â-25% -29.5% -565458 Â-49% sched_debug.cfs_rq:/.spread0.8
> -804662 Â-25% -29.4% -567811 Â-48% sched_debug.cfs_rq:/.spread0.min
> 1233 Â 5% +30.9% 1614 Â 28% sched_debug.cfs_rq:/.tg_load_avg.0
> 1233 Â 5% +30.9% 1614 Â 28% sched_debug.cfs_rq:/.tg_load_avg.1
> 1228 Â 5% +30.3% 1601 Â 27% sched_debug.cfs_rq:/.tg_load_avg.10
> 1228 Â 5% +30.4% 1601 Â 27% sched_debug.cfs_rq:/.tg_load_avg.11
> 1228 Â 5% +30.3% 1601 Â 27% sched_debug.cfs_rq:/.tg_load_avg.12
> 1229 Â 5% +30.0% 1598 Â 27% sched_debug.cfs_rq:/.tg_load_avg.13
> 1228 Â 5% +30.1% 1598 Â 27% sched_debug.cfs_rq:/.tg_load_avg.14
> 1229 Â 5% +30.0% 1598 Â 27% sched_debug.cfs_rq:/.tg_load_avg.15
> 1226 Â 5% +30.3% 1598 Â 27% sched_debug.cfs_rq:/.tg_load_avg.16
> 1226 Â 5% +30.2% 1597 Â 27% sched_debug.cfs_rq:/.tg_load_avg.17
> 1227 Â 5% +30.1% 1595 Â 27% sched_debug.cfs_rq:/.tg_load_avg.18
> 1227 Â 5% +29.4% 1588 Â 26% sched_debug.cfs_rq:/.tg_load_avg.19
> 1233 Â 5% +30.4% 1609 Â 27% sched_debug.cfs_rq:/.tg_load_avg.2
> 1222 Â 5% +29.9% 1587 Â 26% sched_debug.cfs_rq:/.tg_load_avg.20
> 1223 Â 5% +24.2% 1519 Â 20% sched_debug.cfs_rq:/.tg_load_avg.21
> 1223 Â 5% +23.8% 1515 Â 20% sched_debug.cfs_rq:/.tg_load_avg.22
> 1223 Â 5% +23.9% 1515 Â 20% sched_debug.cfs_rq:/.tg_load_avg.23
> 1223 Â 5% +23.9% 1515 Â 20% sched_debug.cfs_rq:/.tg_load_avg.24
> 1223 Â 5% +23.5% 1511 Â 19% sched_debug.cfs_rq:/.tg_load_avg.25
> 1224 Â 5% +23.5% 1512 Â 19% sched_debug.cfs_rq:/.tg_load_avg.26
> 1223 Â 5% +23.1% 1506 Â 19% sched_debug.cfs_rq:/.tg_load_avg.27
> 1223 Â 5% +22.5% 1499 Â 19% sched_debug.cfs_rq:/.tg_load_avg.28
> 1224 Â 5% +22.5% 1499 Â 19% sched_debug.cfs_rq:/.tg_load_avg.29
> 1233 Â 5% +30.3% 1607 Â 27% sched_debug.cfs_rq:/.tg_load_avg.3
> 1223 Â 5% +22.2% 1495 Â 18% sched_debug.cfs_rq:/.tg_load_avg.30
> 1224 Â 5% +22.0% 1493 Â 19% sched_debug.cfs_rq:/.tg_load_avg.31
> 1234 Â 5% +30.0% 1604 Â 28% sched_debug.cfs_rq:/.tg_load_avg.4
> 1233 Â 5% +30.0% 1604 Â 28% sched_debug.cfs_rq:/.tg_load_avg.5
> 1231 Â 5% +30.3% 1604 Â 28% sched_debug.cfs_rq:/.tg_load_avg.6
> 1233 Â 5% +30.0% 1603 Â 27% sched_debug.cfs_rq:/.tg_load_avg.7
> 1231 Â 5% +30.1% 1601 Â 27% sched_debug.cfs_rq:/.tg_load_avg.8
> 1228 Â 5% +30.3% 1601 Â 27% sched_debug.cfs_rq:/.tg_load_avg.9
> 1228 Â 5% +27.8% 1569 Â 24% sched_debug.cfs_rq:/.tg_load_avg.avg
> 1246 Â 5% +30.7% 1628 Â 27% sched_debug.cfs_rq:/.tg_load_avg.max
> 1212 Â 5% +22.2% 1481 Â 19% sched_debug.cfs_rq:/.tg_load_avg.min
> 15.00 Â 9% +198.3% 44.75 Â 72% sched_debug.cfs_rq:/.tg_load_avg_contrib.21
> 25.00 Â 29% +574.0% 168.50 Â 78% sched_debug.cfs_rq:/.tg_load_avg_contrib.9
> 38.53 Â 5% +29.0% 49.71 Â 26% sched_debug.cfs_rq:/.tg_load_avg_contrib.avg
> 63.34 Â 10% +44.1% 91.30 Â 36% sched_debug.cfs_rq:/.tg_load_avg_contrib.stddev
> 532.25 Â 2% +8.5% 577.50 Â 6% sched_debug.cfs_rq:/.util_avg.15
> 210.75 Â 14% -14.4% 180.50 Â 4% sched_debug.cfs_rq:/.util_avg.29
> 450.00 Â 22% +50.7% 678.00 Â 18% sched_debug.cfs_rq:/.util_avg.9
> 955572 Â 4% -10.2% 857813 Â 5% sched_debug.cpu.avg_idle.6
> 23.99 Â 60% -76.2% 5.71 Â 24% sched_debug.cpu.clock.stddev
> 23.99 Â 60% -76.2% 5.71 Â 24% sched_debug.cpu.clock_task.stddev
> 2840 Â 37% -47.4% 1492 Â 65% sched_debug.cpu.curr->pid.25
> 34.75 Â148% +132.4% 80.75 Â 82% sched_debug.cpu.load.8
> 61776 Â 7% -7.1% 57380 Â 0% sched_debug.cpu.nr_load_updates.25
> 6543 Â 2% +20.4% 7879 Â 9% sched_debug.cpu.nr_switches.0
> 5256 Â 23% +177.1% 14566 Â 52% sched_debug.cpu.nr_switches.27
> 7915 Â 3% +8.7% 8605 Â 3% sched_debug.cpu.nr_switches.avg
> -0.25 Â-519% +1900.0% -5.00 Â-24% sched_debug.cpu.nr_uninterruptible.12
> 2.00 Â 93% -125.0% -0.50 Â-300% sched_debug.cpu.nr_uninterruptible.24
> 17468 Â 14% +194.3% 51413 Â 75% sched_debug.cpu.sched_count.15
> 2112 Â 2% +20.8% 2552 Â 11% sched_debug.cpu.sched_goidle.0
> 2103 Â 34% +219.0% 6709 Â 55% sched_debug.cpu.sched_goidle.27
> 3159 Â 3% +8.2% 3418 Â 4% sched_debug.cpu.sched_goidle.avg
> 1323 Â 64% -72.7% 361.50 Â 15% sched_debug.cpu.ttwu_count.23
> 3264 Â 12% +94.4% 6347 Â 41% sched_debug.cpu.ttwu_count.27
> 3860 Â 3% +9.0% 4208 Â 3% sched_debug.cpu.ttwu_count.avg
> 2358 Â 3% +28.7% 3035 Â 9% sched_debug.cpu.ttwu_local.0
> 1110 Â 22% +54.6% 1716 Â 28% sched_debug.cpu.ttwu_local.27
> 1814 Â 8% +16.1% 2106 Â 5% sched_debug.cpu.ttwu_local.stddev
>
>
> lkp-snb01: Sandy Bridge-EP
> Memory: 32G
>
> will-it-scale.per_thread_ops
>
> 1.2e+06 ++---------------------------------------------------------------+
> | O |
> 1.15e+06 O+O O O O O O O O |
> 1.1e+06 ++ |
> | O O O O O OO |
> 1.05e+06 ++ O O |
> 1e+06 ++ |
> | |
> 950000 ++ |
> 900000 ++ |
> | |
> 850000 ++ |
> 800000 *+*.*.*.*.*.*.*.*.*.*.*.*. .*.*. *.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*
> | * * |
> 750000 ++---------------------------------------------------------------+
>
>
> will-it-scale.time.user_time
>
> 50 ++---------------------------------------------------------------------+
> | |
> 45 ++ O O O O O O |
> O O O O |
> | O O O O O |
> 40 ++ O O O O |
> | |
> 35 ++ |
> | |
> 30 ++ |
> | |
> | * |
> 25 ++ + + |
> *.*.*.*..*.* *.*.*..*.*.*.*.*.*.*..*.*.*.*.*.*.*..*.*.*.*.*.*..*.*.*.*
> 20 ++---------------------------------------------------------------------+
>
>
> [*] bisect-good sample
> [O] bisect-bad sample
>
> To reproduce:
>
> git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
> cd lkp-tests
> bin/lkp install job.yaml # job file is attached in this email
> bin/lkp run job.yaml
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> Thanks,
> Ying Huang

Thanks...

Huh...I'm stumped on this one. If anything I would have expected better
performance with this patch since we don't even take the file_lock or
do the fcheck in the F_UNLCK codepath now, or when there is an error.

I'll see if I can reproduce it on my own test rig, but I'd welcome
ideas of where and how this performance regression could have crept in.

--
Jeff Layton <jeff.layton@xxxxxxxxxxxxxxx>