Re: [lkp] [x86/hweight] 65ea11ec6a: will-it-scale.per_process_ops 9.3% improvement

From: H. Peter Anvin
Date: Tue Aug 16 2016 - 12:59:52 EST


On August 16, 2016 7:26:43 AM PDT, kernel test robot <xiaolong.ye@xxxxxxxxx> wrote:
>
>FYI, we noticed a 9.3% improvement of will-it-scale.per_process_ops due
>to commit:
>
>commit 65ea11ec6a82b1d44aba62b59e9eb20247e57c6e ("x86/hweight: Don't
>clobber %rdi")
>https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>master
>
>in testcase: will-it-scale
>on test machine: 32 threads Sandy Bridge-EP with 64G memory
>with following parameters:
>
> test: unix1
> cpufreq_governor: performance
>
>
>Disclaimer:
>Results have been estimated based on internal Intel analysis and are
>provided
>for informational purposes only. Any difference in system hardware or
>software
>design or configuration may affect actual performance.
>
>Details are as below:
>-------------------------------------------------------------------------------------------------->
>
>
>To reproduce:
>
>git clone
>git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
> cd lkp-tests
> bin/lkp install job.yaml # job file is attached in this email
> bin/lkp run job.yaml
>
>=========================================================================================
>compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase:
>gcc-6/performance/x86_64-rhel/debian-x86_64-2015-02-07.cgz/lkp-sb03/unix1/will-it-scale
>
>commit:
> v4.8-rc1
> 65ea11ec6a ("x86/hweight: Don't clobber %rdi")
>
> v4.8-rc1 65ea11ec6a82b1d44aba62b59e
>---------------- --------------------------
> fail:runs %reproduction fail:runs
> | | |
> 1:8 -12% :4 last_state.is_incomplete_run
>4:8 -50% :4
>kmsg.DHCP/BOOTP:Reply_not_for_us,op[#]xid[#]
>7:8 -88% :4
>kmsg.drm:drm_edid_block_valid[drm]]*ERROR*EDID_checksum_is_invalid,remainder_is
>7:8 -88% :4
>kmsg.i8042:Can't_read_CTR_while_initializing_i8042
> %stddev %change %stddev
> \ | \
>1063041 Â 0% +9.3% 1161810 Â 0%
>will-it-scale.per_process_ops
> 976004 Â 0% +9.0% 1063615 Â 0% will-it-scale.per_thread_ops
> 0.57 Â 0% -6.7% 0.53 Â 1% will-it-scale.scalability
> 175.96 Â 0% +8.0% 190.10 Â 0% will-it-scale.time.user_time
>0.00 Â 20% -31.5% 0.00 Â 26%
>sched_debug.cpu.next_balance.stddev
>101.14 Â 11% +9639.4% 9850 Â121%
>latency_stats.avg.rpc_wait_bit_killable.__rpc_execute.rpc_execute.rpc_run_task.nfs4_call_sync_sequence.[nfsv4]._nfs4_proc_getattr.[nfsv4].nfs4_proc_getattr.[nfsv4].__nfs_revalidate_inode.nfs_do_access.nfs_permission.__inode_permission.inode_permission
>148.57 Â 15% +57704.4% 85880 Â125%
>latency_stats.max.rpc_wait_bit_killable.__rpc_execute.rpc_execute.rpc_run_task.nfs4_call_sync_sequence.[nfsv4]._nfs4_proc_getattr.[nfsv4].nfs4_proc_getattr.[nfsv4].__nfs_revalidate_inode.nfs_do_access.nfs_permission.__inode_permission.inode_permission
>886.00 Â 14% +9757.0% 87333 Â123%
>latency_stats.sum.rpc_wait_bit_killable.__rpc_execute.rpc_execute.rpc_run_task.nfs4_call_sync_sequence.[nfsv4]._nfs4_proc_getattr.[nfsv4].nfs4_proc_getattr.[nfsv4].__nfs_revalidate_inode.nfs_do_access.nfs_permission.__inode_permission.inode_permission
>3.041e+12 Â 1% +7.4% 3.267e+12 Â 1%
>perf-stat.branch-instructions
> 0.31 Â 0% -86.6% 0.04 Â 4% perf-stat.branch-miss-rate
> 9.456e+09 Â 1% -85.6% 1.364e+09 Â 3% perf-stat.branch-misses
> 5.147e+12 Â 1% +5.4% 5.427e+12 Â 1% perf-stat.dTLB-loads
> 3.869e+12 Â 0% +6.7% 4.128e+12 Â 1% perf-stat.dTLB-stores
> 29.02 Â 13% +223.2% 93.80 Â 0% perf-stat.iTLB-load-miss-rate
>2.353e+08 Â 21% +733.0% 1.96e+09 Â 0% perf-stat.iTLB-load-misses
> 5.7e+08 Â 9% -77.2% 1.297e+08 Â 10% perf-stat.iTLB-loads
> 1.696e+13 Â 0% +6.9% 1.814e+13 Â 0% perf-stat.instructions
>75030 Â 18% -87.7% 9251 Â 1%
>perf-stat.instructions-per-iTLB-miss
> 1.04 Â 0% +7.6% 1.12 Â 1% perf-stat.ipc
> 24064971 Â 3% -6.6% 22469931 Â 3% perf-stat.node-load-misses
> 53705459 Â 1% -3.1% 52034054 Â 2% perf-stat.node-loads
>7.32 Â 5% +23.3% 9.03 Â 4%
>perf-profile.cycles.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg.sock_sendmsg
>1.29 Â 4% +11.7% 1.44 Â 5%
>perf-profile.cycles.__fdget_pos.sys_write.entry_SYSCALL_64_fastpath
>1.15 Â 4% +12.1% 1.29 Â 4%
>perf-profile.cycles.__fget.__fget_light.__fdget_pos.sys_write.entry_SYSCALL_64_fastpath
>1.22 Â 5% +11.7% 1.36 Â 5%
>perf-profile.cycles.__fget_light.__fdget_pos.sys_write.entry_SYSCALL_64_fastpath
>1.86 Â 4% -58.4% 0.77 Â 7%
>perf-profile.cycles.__inode_security_revalidate.selinux_file_permission.security_file_permission.rw_verify_area.vfs_write
>0.00 Â -1% +Inf% 2.65 Â 5%
>perf-profile.cycles.__kmalloc_node_track_caller.__kmalloc_reserve.isra.33.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
>1.89 Â 8% -100.0% 0.00 Â -1%
>perf-profile.cycles.__kmalloc_node_track_caller.__kmalloc_reserve.isra.35.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
>0.00 Â -1% +Inf% 3.55 Â 5%
>perf-profile.cycles.__kmalloc_reserve.isra.33.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg
>2.52 Â 8% -100.0% 0.00 Â -1%
>perf-profile.cycles.__kmalloc_reserve.isra.35.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg
>1.43 Â 4% -91.1% 0.13 Â173%
>perf-profile.cycles.__might_sleep.__inode_security_revalidate.selinux_file_permission.security_file_permission.rw_verify_area
>1.15 Â 5% -65.7% 0.40 Â 57%
>perf-profile.cycles.__might_sleep.mutex_lock.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
>1.33 Â 7% +14.0% 1.52 Â 2%
>perf-profile.cycles._raw_spin_lock_irqsave.skb_queue_tail.unix_stream_sendmsg.sock_sendmsg.sock_write_iter
>1.37 Â 6% +20.4% 1.65 Â 3%
>perf-profile.cycles._raw_spin_lock_irqsave.skb_unlink.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
>1.09 Â 9% +15.6% 1.26 Â 5%
>perf-profile.cycles._raw_spin_unlock_irqrestore.skb_queue_tail.unix_stream_sendmsg.sock_sendmsg.sock_write_iter
>1.01 Â 6% +15.4% 1.17 Â 7%
>perf-profile.cycles._raw_spin_unlock_irqrestore.skb_unlink.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
>8.01 Â 6% +22.5% 9.82 Â 4%
>perf-profile.cycles.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg.sock_sendmsg.sock_write_iter
>7.33 Â 6% +14.8% 8.42 Â 4%
>perf-profile.cycles.consume_skb.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter
>0.98 Â 8% +15.0% 1.12 Â 4%
>perf-profile.cycles.consume_skb.unix_stream_recvmsg.sock_recvmsg.sock_read_iter.__vfs_read
>1.60 Â 5% +18.7% 1.91 Â 3%
>perf-profile.cycles.copy_from_iter.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg.sock_write_iter
>2.30 Â 4% +11.5% 2.56 Â 6%
>perf-profile.cycles.entry_SYSCALL_64
>2.10 Â 3% +18.1% 2.48 Â 5%
>perf-profile.cycles.entry_SYSCALL_64_after_swapgs
>2.82 Â 7% -34.6% 1.85 Â 6%
>perf-profile.cycles.file_has_perm.selinux_file_permission.security_file_permission.rw_verify_area.vfs_read
>1.55 Â 6% +21.3% 1.89 Â 5%
>perf-profile.cycles.fput.entry_SYSCALL_64_fastpath
>1.13 Â 9% +17.0% 1.32 Â 3%
>perf-profile.cycles.kfree.skb_free_head.skb_release_data.skb_release_all.consume_skb
>0.76 Â 8% +21.9% 0.93 Â 5%
>perf-profile.cycles.kfree_skbmem.consume_skb.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
>0.77 Â 10% +27.0% 0.98 Â 5%
>perf-profile.cycles.ksize.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg
>2.08 Â 6% -31.5% 1.42 Â 6%
>perf-profile.cycles.mutex_lock.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter
>0.89 Â 9% +18.8% 1.06 Â 6%
>perf-profile.cycles.mutex_unlock.unix_stream_recvmsg.sock_recvmsg.sock_read_iter.__vfs_read
>6.80 Â 3% -19.3% 5.49 Â 3%
>perf-profile.cycles.rw_verify_area.vfs_read.sys_read.entry_SYSCALL_64_fastpath
>5.54 Â 4% -23.5% 4.24 Â 5%
>perf-profile.cycles.rw_verify_area.vfs_write.sys_write.entry_SYSCALL_64_fastpath
>6.21 Â 4% -19.5% 5.00 Â 3%
>perf-profile.cycles.security_file_permission.rw_verify_area.vfs_read.sys_read.entry_SYSCALL_64_fastpath
>5.23 Â 4% -25.6% 3.89 Â 5%
>perf-profile.cycles.security_file_permission.rw_verify_area.vfs_write.sys_write.entry_SYSCALL_64_fastpath
>4.67 Â 4% -24.1% 3.55 Â 4%
>perf-profile.cycles.selinux_file_permission.security_file_permission.rw_verify_area.vfs_read.sys_read
>4.87 Â 5% -28.0% 3.51 Â 5%
>perf-profile.cycles.selinux_file_permission.security_file_permission.rw_verify_area.vfs_write.sys_write
>2.43 Â 5% +29.8% 3.15 Â 3%
>perf-profile.cycles.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.__vfs_write
>1.18 Â 8% +16.1% 1.36 Â 2%
>perf-profile.cycles.skb_free_head.skb_release_data.skb_release_all.consume_skb.unix_stream_read_generic
>2.60 Â 7% +15.4% 3.00 Â 3%
>perf-profile.cycles.skb_queue_tail.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.__vfs_write
>6.30 Â 6% +15.2% 7.26 Â 4%
>perf-profile.cycles.skb_release_all.consume_skb.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
>1.45 Â 7% +19.4% 1.73 Â 2%
>perf-profile.cycles.skb_release_data.skb_release_all.consume_skb.unix_stream_read_generic.unix_stream_recvmsg
>4.63 Â 6% +14.4% 5.30 Â 5%
>perf-profile.cycles.skb_release_head_state.skb_release_all.consume_skb.unix_stream_read_generic.unix_stream_recvmsg
>1.01 Â 4% +16.7% 1.18 Â 5%
>perf-profile.cycles.skb_set_owner_w.sock_alloc_send_pskb.unix_stream_sendmsg.sock_sendmsg.sock_write_iter
>2.59 Â 6% +18.2% 3.07 Â 4%
>perf-profile.cycles.skb_unlink.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter
>9.66 Â 5% +21.1% 11.70 Â 3%
>perf-profile.cycles.sock_alloc_send_pskb.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.__vfs_write
>25.86 Â 5% +14.8% 29.68 Â 4%
>perf-profile.cycles.sock_sendmsg.sock_write_iter.__vfs_write.vfs_write.sys_write
>3.88 Â 7% +13.1% 4.38 Â 5%
>perf-profile.cycles.sock_wfree.unix_destruct_scm.skb_release_head_state.skb_release_all.consume_skb
>4.24 Â 7% +13.3% 4.80 Â 5%
>perf-profile.cycles.unix_destruct_scm.skb_release_head_state.skb_release_all.consume_skb.unix_stream_read_generic
>21.96 Â 5% +17.1% 25.71 Â 3%
>perf-profile.cycles.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.__vfs_write.vfs_write
>1.20 Â 6% -100.0% 0.00 Â -1%
>perf-profile.cycles.unix_stream_sendmsg.sock_write_iter.__vfs_write.vfs_write.sys_write
>2.28 Â 6% +13.7% 2.60 Â 3%
>perf-profile.cycles.unix_write_space.sock_wfree.unix_destruct_scm.skb_release_head_state.skb_release_all
>3.84 Â 5% -16.8% 3.20 Â 2%
>perf-profile.func.cycles.___might_sleep
>1.96 Â 7% +20.8% 2.36 Â 4%
>perf-profile.func.cycles.__alloc_skb
>2.40 Â 4% +11.3% 2.67 Â 4% perf-profile.func.cycles.__fget
>1.30 Â 9% +48.7% 1.94 Â 4%
>perf-profile.func.cycles.__kmalloc_node_track_caller
>1.05 Â 5% +12.6% 1.19 Â 7%
>perf-profile.func.cycles.__vfs_read
>0.99 Â 7% +27.1% 1.26 Â 4%
>perf-profile.func.cycles.__vfs_write
>1.01 Â 5% -51.9% 0.48 Â 3%
>perf-profile.func.cycles._cond_resched
>2.78 Â 6% +17.0% 3.25 Â 2%
>perf-profile.func.cycles._raw_spin_lock_irqsave
>2.19 Â 8% +15.5% 2.53 Â 6%
>perf-profile.func.cycles._raw_spin_unlock_irqrestore
>1.10 Â 8% +11.2% 1.23 Â 4%
>perf-profile.func.cycles.consume_skb
>0.97 Â 5% +25.6% 1.22 Â 3%
>perf-profile.func.cycles.copy_from_iter
>2.30 Â 4% +11.5% 2.56 Â 6%
>perf-profile.func.cycles.entry_SYSCALL_64
>2.10 Â 3% +18.1% 2.48 Â 5%
>perf-profile.func.cycles.entry_SYSCALL_64_after_swapgs
>2.26 Â 4% -38.4% 1.39 Â 5%
>perf-profile.func.cycles.file_has_perm
> 1.55 Â 6% +21.3% 1.89 Â 5% perf-profile.func.cycles.fput
> 1.18 Â 8% +17.2% 1.38 Â 3% perf-profile.func.cycles.kfree
> 0.86 Â 10% +22.0% 1.05 Â 4% perf-profile.func.cycles.ksize
>0.90 Â 8% +18.7% 1.06 Â 5%
>perf-profile.func.cycles.mutex_unlock
>1.91 Â 6% -13.1% 1.66 Â 3%
>perf-profile.func.cycles.selinux_file_permission
>1.05 Â 5% +16.7% 1.23 Â 5%
>perf-profile.func.cycles.skb_set_owner_w
>1.66 Â 8% +16.3% 1.93 Â 7%
>perf-profile.func.cycles.sock_wfree
>2.44 Â 4% -39.7% 1.47 Â 2%
>perf-profile.func.cycles.sock_write_iter
>4.20 Â 6% -21.1% 3.32 Â 3%
>perf-profile.func.cycles.unix_stream_sendmsg
>2.35 Â 6% +14.3% 2.69 Â 3%
>perf-profile.func.cycles.unix_write_space
>
>
>
>Thanks,
>Xiaolong

Dang...
--
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.