Re: [RFC PATCH 0/5] x86: dynamic indirect call promotion

From: Nadav Amit
Date: Tue Oct 23 2018 - 16:32:27 EST


at 11:36 AM, Dave Hansen <dave.hansen@xxxxxxxxx> wrote:

> On 10/17/18 5:54 PM, Nadav Amit wrote:
>> base relpoline
>> ---- ---------
>> nginx 22898 25178 (+10%)
>> redis-ycsb 24523 25486 (+4%)
>> dbench 2144 2103 (+2%)
>
> Just out of curiosity, which indirect branches are the culprits here for
> causing the slowdowns?

So I didnât try to measure exactly which one. There are roughly 500 that
actually ârunâ in my tests. Initially, I took the silly approach of trying
to patch the C source-code using semi automatically-generated Coccinelle
scripts, so I can tell you it is not just few branches but many. The
network stack is full of function pointers (e.g., tcp_congestion_ops,
tcp_sock_af_ops, dst_ops). The file-system also uses many function pointers
(file_operations specifically). Compound-pages have dâtor and so on.

If you want, you can rebuild the kernel without retpolines and run

perf record -e br_inst_exec.taken_indirect_near_call:k (your workload)

For some reason I didnât manage to use PEBS (:ppp) from either the guest or
the host, so my results are a bit skewed (i.e., the sampled location is
usually after the call was taken). Running dbench in the VM gives me the
following âhot-spotsâ:

# Samples: 304 of event 'br_inst_exec.taken_indirect_near_call'
# Event count (approx.): 60800912
#
# Overhead Command Shared Object Symbol
# ........ ....... ....................... .............................................
#
5.26% :197970 [guest.kernel.kallsyms] [g] __fget_light
4.28% :197969 [guest.kernel.kallsyms] [g] __fget_light
3.95% :197969 [guest.kernel.kallsyms] [g] dcache_readdir
3.29% :197970 [guest.kernel.kallsyms] [g] next_positive.isra.14
2.96% :197970 [guest.kernel.kallsyms] [g] __do_sys_kill
2.30% :197970 [guest.kernel.kallsyms] [g] apparmor_file_open
1.97% :197969 [guest.kernel.kallsyms] [g] __do_sys_kill
1.97% :197969 [guest.kernel.kallsyms] [g] next_positive.isra.14
1.97% :197970 [guest.kernel.kallsyms] [g] _raw_spin_lock
1.64% :197969 [guest.kernel.kallsyms] [g] __alloc_file
1.64% :197969 [guest.kernel.kallsyms] [g] common_file_perm
1.64% :197969 [guest.kernel.kallsyms] [g] filldir
1.64% :197970 [guest.kernel.kallsyms] [g] do_dentry_open
1.64% :197970 [guest.kernel.kallsyms] [g] kmem_cache_free
1.32% :197969 [guest.kernel.kallsyms] [g] __raw_callee_save___pv_queued_spin_unlock
1.32% :197969 [guest.kernel.kallsyms] [g] __slab_free

Regards,
Nadav