[RFC PATCH v2 0/4] dynamic indirect call promotion

From: Edward Cree
Date: Fri Feb 01 2019 - 19:06:04 EST


This series introduces 'dynamic_calls', branch trees of static calls (updated
at runtime using text patching), to avoid making indirect calls to common
targets. The basic mechanism is
if (func == static_key_1.target)
call_static_key_1(args);
else if (func == static_key_2.target)
call_static_key_2(args);
/* ... */
else
(*func)(args); /* typically leads to a retpoline nowadays */
with some additional statistics-gathering to allow periodic relearning of
branch targets. Creating and calling a dynamic call table are each a single
line in the consuming code, although they expand to a nontrivial amount of
data and text in the kernel image.
This is essentially indirect branch prediction, performed in software because
we can't trust hardware to get it right. While the processor may speculate
into the function calls, this is *probably* OK since they are known to be
functions that frequently get called in this path, and thus are much less
likely to contain side-channel information leaks than a completely arbitrary
target PC from a branch target buffer. Moreover, when the speculation is
accurate we positively want to be able to speculate into the callee.
The branch target statistics are collected with percpu variables, counting
both 'hits' on the existing branch targets and 'misses', divided into counts
for up to four specific targets (first-come-first-served) and a catch-all
miss count used once that table is full.
When the total number of specific misses on a cpu reaches 1000, work is
triggered which adds up counts across all CPUs and chooses the two most-
popular call targets to patch into the call path.
If instead the catch-all miss count reaches 1000, the counts and specific
targets for that cpu are discarded, since either the target is too
unpredictable (lots of low-frequency callees rather than a few dominating
ones) or the targets that populated the table were by chance unpopular ones.
To ensure that the static key target does not change between the if () check
and the call, the whole dynamic_call must take place in an RCU read-side
critical section (which, since the callee does not know it is being called in
this manner, then lasts at least until the callee returns), and the patching
at re-learning time is done with the help of a static_key to switch callers
off the dynamic_call path and RCU synchronisation to ensure none are still on
it. In cases where RCU cannot be used (e.g. because some callees need to RCU
synchronise), it might be possible to add a variant that uses
synchronize_rcu_tasks() when updating, but this series does not attempt this.

The dynamic_calls created by this series are opt-in, partly because of the
abovementioned rcu_read_lock requirement.

My attempts to measure the performance impact of dynamic_calls have been
inconclusive; the effects on an RX-side UDP packet rate test were within
Â1.5% and nowhere near statistical significance (p around 0.2-0.3 with n=6
in a Welch t-test). This could mean that dynamic_calls are ineffective,
but it could also mean that many more sites need converting before any gain
shows up, or it could just mean that my testing was insufficiently sensitive
or measuring the wrong thing. Given these poor results, this series is
clearly not 'ready', hence the RFC tags, but hopefully it will inform the
discussion in this area.

As before, this series depends on Josh's "static calls" patch series (v3 this
time). My testing was done with out-of-line static calls, since the inline
implementation lead to crashes; I have not yet determined whether they were
the fault of my patch or of the static calls series.

Edward Cree (4):
static_call: add indirect call promotion (dynamic_call) infrastructure
net: core: use a dynamic_call for pt_prev->func() in RX path
net: core: use a dynamic_call for dst_input
net: core: use a dynamic_call for pt_prev->list_func() in list RX path

include/linux/dynamic_call.h | 300 +++++++++++++++++++++++++++++++++++++++++++
include/net/dst.h | 5 +-
init/Kconfig | 11 ++
kernel/Makefile | 1 +
kernel/dynamic_call.c | 131 +++++++++++++++++++
net/core/dev.c | 18 ++-
net/core/dst.c | 2 +
7 files changed, 463 insertions(+), 5 deletions(-)
create mode 100644 include/linux/dynamic_call.h
create mode 100644 kernel/dynamic_call.c