Re: [LKP] [lkp-robot] [fs/locks] 52306e882f: stress-ng.lockofd.ops_per_sec -11% regression

From: Jeff Layton
Date: Tue Dec 05 2017 - 06:01:10 EST


On Tue, 2017-12-05 at 13:57 +0800, Aaron Lu wrote:
> On Wed, Nov 08, 2017 at 03:22:33PM +0800, Aaron Lu wrote:
> > On Thu, Sep 28, 2017 at 04:02:23PM +0800, kernel test robot wrote:
> > >
> > > Greeting,
> > >
> > > FYI, we noticed a -11% regression of stress-ng.lockofd.ops_per_sec due to commit:
> > >
> > >
> > > commit: 52306e882f77d3fd73f91435c41373d634acc5d2 ("fs/locks: Use allocation rather than the stack in fcntl_getlk()")
> > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > It's been a while, I wonder what do you think of this regression?
> >
> > The test stresses byte-range locks AFAICS and since the commit uses
> > dynamic allocation instead of stack for the 'struct file_lock', it sounds
> > natural the performance regressed for this test.
> >
> > Now the question is, do we care about the performance regression here?
>
> Appreciated it if you can share your opinion on this, thanks.
>
> Regards,
> Aaron
>

Sorry I missed your earlier mail about this. My feeling is to not worry
about it. struct file_lock is rather large, so putting it on the stack
was always a bit dangerous, and F_GETLK is a rather uncommon operation
anyway.

That said, if there are real-world workloads that have regressed because
of this patch, I'm definitely open to backing it out.

Does anyone else have opinions on the matter?


> > Feel free to let me know if you need any other data.
> >
> > Thanks for your time.
> >
> > > in testcase: stress-ng
> > > on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 128G memory
> > > with following parameters:
> > >
> > > testtime: 1s
> > > class: filesystem
> > > cpufreq_governor: performance
> > >
> > >
> > >
> > >
> > > Details are as below:
> > > -------------------------------------------------------------------------------------------------->
> > >
> > >
> > > To reproduce:
> > >
> > > git clone https://github.com/intel/lkp-tests.git
> > > cd lkp-tests
> > > bin/lkp install job.yaml # job file is attached in this email
> > > bin/lkp run job.yaml
> > >
> > > testcase/path_params/tbox_group/run: stress-ng/1s-filesystem-performance/lkp-bdw-ep6
> > >
> > > v4.13-rc1 52306e882f77d3fd73f91435c4
> > > ---------------- --------------------------
> > > %stddev change %stddev
> > > \ | \
> > > 1.219e+08 -11% 1.09e+08 stress-ng.lockofd.ops_per_sec
> > > 1.229e+08 -10% 1.103e+08 stress-ng.locka.ops_per_sec
> > > 1.233e+08 -10% 1.105e+08 stress-ng.locka.ops
> > > 1.223e+08 -11% 1.093e+08 stress-ng.lockofd.ops
> > > 1061237 10% 1168476 stress-ng.eventfd.ops
> > > 1061205 10% 1168414 stress-ng.eventfd.ops_per_sec
> > > 2913174 9% 3163165 stress-ng.time.voluntary_context_switches
> > > 89.90 -4% 86.58 stress-ng.time.user_time
> > > 26510 -6% 24822 stress-ng.io.ops
> > > 26489 -6% 24798 stress-ng.io.ops_per_sec
> > > 885499 Â 14% 18% 1042236 perf-stat.cpu-migrations
> > > 2.537e+08 10% 2.783e+08 perf-stat.node-store-misses
> > > 1067830 Â 4% 8% 1154877 Â 3% perf-stat.page-faults
> > > 5384755 Â 4% 7% 5747689 perf-stat.context-switches
> > > 32.28 7% 34.42 Â 3% perf-stat.node-store-miss-rate%
> > > 12245 Â110% -7e+03 5367 Â 29% latency_stats.avg.call_usermodehelper_exec.__request_module.get_fs_type.do_mount.SyS_mount.entry_SYSCALL_64_fastpath
> > > 311261 Â173% -3e+05 11702 Â100% latency_stats.avg.tty_release_struct.tty_release.__fput.____fput.task_work_run.exit_to_usermode_loop.syscall_return_slowpath.entry_SYSCALL_64_fastpath
> > > 1472 Â 60% 4e+03 5144 Â 97% latency_stats.max.sync_inodes_sb.sync_inodes_one_sb.iterate_supers.sys_sync.entry_SYSCALL_64_fastpath
> > > 225 Â 39% 3e+03 3698 Â132% latency_stats.max.rpc_wait_bit_killable.__rpc_wait_for_completion_task.nfs4_do_close.[nfsv4].__nfs4_close.[nfsv4].nfs4_close_sync.[nfsv4].nfs4_close_context.[nfsv4].__put_nfs_open_context.nfs_file_clear_open_context.nfs_file_release.__fput.____fput.task_work_run
> > > 228 Â 34% 3e+03 3103 Â159% latency_stats.max.rpc_wait_bit_killable.__rpc_wait_for_completion_task.nfs4_run_open_task.[nfsv4].nfs4_do_open.[nfsv4].nfs4_atomic_open.[nfsv4].nfs4_file_open.[nfsv4].do_dentry_open.vfs_open.path_openat.do_filp_open.do_sys_open.SyS_open
> > > 270 Â 24% 3e+03 3110 Â162% latency_stats.max.io_schedule.wait_on_page_bit_common.__filemap_fdatawait_range.filemap_write_and_wait_range.nfs_file_fsync.vfs_fsync_range.vfs_fsync.nfs4_file_flush.[nfsv4].filp_close.do_dup2.SyS_dup2.entry_SYSCALL_64_fastpath
> > > 12245 Â110% -7e+03 5367 Â 29% latency_stats.max.call_usermodehelper_exec.__request_module.get_fs_type.do_mount.SyS_mount.entry_SYSCALL_64_fastpath
> > > 927506 Â173% -9e+05 11702 Â100% latency_stats.max.tty_release_struct.tty_release.__fput.____fput.task_work_run.exit_to_usermode_loop.syscall_return_slowpath.entry_SYSCALL_64_fastpath
> > > 7892 Â 54% 3e+04 33793 Â131% latency_stats.sum.sync_inodes_sb.sync_inodes_one_sb.iterate_supers.sys_sync.entry_SYSCALL_64_fastpath
> > > 12030 Â109% 2e+04 33536 Â136% latency_stats.sum.autofs4_wait.autofs4_mount_wait.autofs4_d_manage.follow_managed.lookup_fast.path_openat.do_filp_open.do_sys_open.SyS_open.entry_SYSCALL_64_fastpath
> > > 14311 Â 15% 7e+03 21729 Â116% latency_stats.sum.call_rwsem_down_read_failed.page_lock_anon_vma_read.rmap_walk_anon.rmap_walk.try_to_unmap.migrate_pages.migrate_misplaced_page.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
> > > 8095 Â 22% 7e+03 15421 Â 79% latency_stats.sum.call_rwsem_down_read_failed.rmap_walk_anon.rmap_walk.remove_migration_ptes.migrate_pages.migrate_misplaced_page.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
> > > 495 Â 39% 7e+03 7470 Â131% latency_stats.sum.rpc_wait_bit_killable.__rpc_wait_for_completion_task.nfs4_do_close.[nfsv4].__nfs4_close.[nfsv4].nfs4_close_sync.[nfsv4].nfs4_close_context.[nfsv4].__put_nfs_open_context.nfs_file_clear_open_context.nfs_file_release.__fput.____fput.task_work_run
> > > 3449 Â 18% 7e+03 10001 Â 68% latency_stats.sum.io_schedule.wait_on_page_bit_common.__filemap_fdatawait_range.filemap_fdatawait_keep_errors.sync_inodes_sb.sync_inodes_one_sb.iterate_supers.sys_sync.entry_SYSCALL_64_fastpath
> > > 378 Â123% 6e+03 6438 Â109% latency_stats.sum.io_schedule.__lock_page.find_lock_entry.shmem_getpage_gfp.shmem_fault.__do_fault.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
> > > 386 Â 34% 3e+03 3242 Â153% latency_stats.sum.rpc_wait_bit_killable.__rpc_wait_for_completion_task.nfs4_run_open_task.[nfsv4].nfs4_do_open.[nfsv4].nfs4_atomic_open.[nfsv4].nfs4_file_open.[nfsv4].do_dentry_open.vfs_open.path_openat.do_filp_open.do_sys_open.SyS_open
> > > 413 Â 19% 3e+03 3256 Â156% latency_stats.sum.io_schedule.wait_on_page_bit_common.__filemap_fdatawait_range.filemap_write_and_wait_range.nfs_file_fsync.vfs_fsync_range.vfs_fsync.nfs4_file_flush.[nfsv4].filp_close.do_dup2.SyS_dup2.entry_SYSCALL_64_fastpath
> > > 2874 Â173% -3e+03 0 latency_stats.sum.call_rwsem_down_write_failed.do_unlinkat.SyS_unlink.entry_SYSCALL_64_fastpath
> > > 3683 Â172% -4e+03 5 Â100% latency_stats.sum.call_rwsem_down_read_failed.lookup_slow.walk_component.path_lookupat.filename_lookup.user_path_at_empty.SyS_chmod.entry_SYSCALL_64_fastpath
> > > 12245 Â110% -7e+03 5367 Â 29% latency_stats.sum.call_usermodehelper_exec.__request_module.get_fs_type.do_mount.SyS_mount.entry_SYSCALL_64_fastpath
> > > 7320 Â136% -7e+03 0 latency_stats.sum.call_rwsem_down_write_failed.do_truncate.do_sys_ftruncate.SyS_ftruncate.entry_SYSCALL_64_fastpath
> > > 52297 Â124% -3e+04 26320 Â 16% latency_stats.sum.call_rwsem_down_read_failed.lookup_slow.walk_component.link_path_walk.path_parentat.filename_parentat.do_unlinkat.SyS_unlink.entry_SYSCALL_64_fastpath
> > > 64452 Â 83% -3e+04 31383 Â 30% latency_stats.sum.call_rwsem_down_write_failed.do_rmdir.SyS_rmdir.entry_SYSCALL_64_fastpath
> > > 1867567 Â173% -2e+06 11702 Â100% latency_stats.sum.tty_release_struct.tty_release.__fput.____fput.task_work_run.exit_to_usermode_loop.syscall_return_slowpath.entry_SYSCALL_64_fastpath
> > >
> > >
> > >
> > >
> > > Disclaimer:
> > > Results have been estimated based on internal Intel analysis and are provided
> > > for informational purposes only. Any difference in system hardware or software
> > > design or configuration may affect actual performance.
> > >
> > >
> > > Thanks,
> > > Xiaolong

--
Jeff Layton <jlayton@xxxxxxxxxx>