[POC 01/12] Accessing __ro_after_init variables as immediates
From: Rasmus Villemoes
Date: Wed Oct 17 2018 - 18:34:48 EST
[This is on top of 58d20fcbd005 "Merge branch 'x86/grub2'" from the -tip
tree, to have the macros.S mechanism available].
One can replace various uses of variables that are initialized at init
and then never changed, so that the code never actually loads the
variable from memory. Instead, the value of the variable gets encoded as
an immediate operand. For example, many code paths do something like
p = kmem_cache_alloc(foo_cachep, GFP_KERNEL)
where foo_cachep is some global variable that is set in an init
function. The theory is that one can avoid the cost of a D$ miss by
having the cpu load the value of foo_cachep directly into %rdi from the
instruction stream. There's no way around the I$ cost of running a piece
of code.
For system hash tables there are typically two __ro_after_init variables
in play, the base and either a shift (e.g. dcache) or a
mask (e.g. futex). In both cases, one can implement the entire
computation of the relevant hash bucket using just the hash value as
input.
For now, this just aims at giving a POC implementation of the above
access patterns for x86-64, but one can rather easily identify other
patterns one might want to support. For example, pgdir_shift could give
rise to implementing rai_shl() and rai_shr(), and rai_and() is also an
obvious candidate.
Going a bit further, and no longer restricting to __ro_after_init
variables, one can imagine implementing rai_gt(), rai_leq() etc. via asm
goto, to allow comparisons to sysctl limits. But while that might be
able to reuse some of this infrastructure, one would need some way to
trigger (another) .text update from the sysctl handler.
I'm not enforcing that referenced variables are actually
__ro_after_init, partly because many of the obvious subjects are merely
__read_mostly, partly to be able to change some test variables
deliberately and see that the rai_load still returns the initial value.
The prefix rai_ is probably awful, but seemed to be an available
three-letter acronym. Suggestions for better naming are much welcome.
Implementation-wise, each access to a rai variable that should be
patched shortly after init needs to be annotated using one of the rai_*
macros. Doing anything more automatic would likely require a gcc plugin,
and I'm not sure all read accesses to rai variables from non-init code
should necessarily be patched. I'd really like kmalloc(128, GFP_KERNEL)
to do a rai_load() of the appropriate kmalloc cache, but it's likely one runs
into some __builtin_constant_p trouble.
At each such access, we create four pieces of data: A template with the
right instructions for patching in, but with dummy immediates; a thunk
which may be slow and stupid, but which computes the correct result
until patching is done (and which is also used as an int3 handler), and
which is careful not to clobber any registers; a short piece of .text
that simply jumps to the thunk, plus nops to make room for the full
template; and finally a struct describing the type of access, the
variables involved and where to find the template, thunk and instruction
to be patched. I'm sure some of this metadata can eventually be
discarded with __initdata, but for now I'm just keeping it simple. It's
not a big deal when there's only a handful of core users, but if the
kmalloc() thing gets implemented, we're going to have lots more
rai_entry's.
I have no idea how to benchmark this, or if it is worth it at all. Any
micro-benchmark would probably just keep the variable in L1 cache, but
if one accesses the variable sufficiently rarely that it's no longer in
L1, that extra cache miss is hardly noticable.
Comments? Flames?
Signed-off-by: Rasmus Villemoes <linux@xxxxxxxxxxxxxxxxxx>
---
include/linux/rai.h | 83 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 83 insertions(+)
create mode 100644 include/linux/rai.h
diff --git a/include/linux/rai.h b/include/linux/rai.h
new file mode 100644
index 000000000000..e839454000ee
--- /dev/null
+++ b/include/linux/rai.h
@@ -0,0 +1,83 @@
+#ifndef _LINUX_RAI_H
+#define _LINUX_RAI_H
+
+/*
+ * These document the behaviour any arch implementation of _rai_*
+ * should have, and can be used by those in cases the arch does not
+ * want to handle (e.g. _rai_load of a 2-byte quantity).
+ */
+#define _rai_load_fallback(var) (var)
+#define _rai_bucket_shift_fallback(base, shift, hash) (&(base)[(hash) >> (shift)])
+#define _rai_bucket_mask_fallback(base, mask, hash) (&(base)[(hash) & (mask)])
+
+#ifdef CONFIG_ARCH_HAS_RAI
+#include <asm/rai.h>
+void update_rai_access(void);
+#else
+static inline void update_rai_access(void) {}
+#endif
+
+#ifdef MODULE /* don't bother with modules for now */
+#undef _rai_load
+#undef _rai_bucket_shift
+#undef _rai_bucket_mask
+#endif
+
+/* Make sure all _rai_* are defined. */
+#ifndef _rai_load
+#define _rai_load _rai_load_fallback
+#endif
+#ifndef _rai_bucket_shift
+#define _rai_bucket_shift _rai_bucket_shift_fallback
+#endif
+#ifndef _rai_bucket_mask
+#define _rai_bucket_mask _rai_bucket_mask_fallback
+#endif
+
+
+/*
+ * The non-underscored rai_* are property of this header, so that it
+ * can do tricks like defining debugging versions. Usually, it just
+ * defines rai_foo as _rai_foo, with the latter being guaranteed to be
+ * defined by the above logic.
+ */
+#if defined(CONFIG_RAI_DEBUG)
+
+#include <bug.h>
+
+#define rai_warn(what, expect, got) \
+ WARN_ONCE(expect != got, \
+ "%s:%d: %s() returned %*phN, expected %*phN\n", \
+ __FILE__, __LINE__, what, \
+ (int)sizeof(got), &(got), \
+ (int)sizeof(expect), &(expect))
+
+#define rai_load(var) ({ \
+ typeof(var) v1 = _rai_load_fallback(var); \
+ typeof(var) v2 = _rai_load(var); \
+ rai_warn("rai_load", v1, v2); \
+ (v1); /* chicken */ \
+ })
+
+#define rai_bucket_shift(base, shift, hash) ({ \
+ typeof(hash) h = (hash); \
+ typeof(base) b1 = _rai_bucket_shift_fallback(base, shift, h); \
+ typeof(base) b2 = _rai_bucket_shift(base, shift, h); \
+ rai_warn("rai_bucket_shift", b1, b2); \
+ (b1); \
+ })
+
+#define rai_bucket_mask(base, mask, hash) ({ \
+ typeof(hash) h = (hash); \
+ typeof(base) b1 = _rai_bucket_mask_fallback(base, mask, h); \
+ typeof(base) b2 = _rai_bucket_mask(base, mask, h); \
+ rai_warn("rai_bucket_mask", b1, b2); \
+ (b1); \
+ })
+#else
+#define rai_load _rai_load
+#define rai_bucket_shift _rai_bucket_shift
+#define rai_bucket_mask _rai_bucket_mask
+#endif
+
+#endif /* _LINUX_RAI_H */
--
2.19.1.6.gbde171bbf5