Re: [PATCH] kallsyms: strip LTO suffixes from static functions

From: Fangrui Song
Date: Tue Jun 22 2021 - 16:18:31 EST


On 2021-06-22, 'Nick Desaulniers' via Clang Built Linux wrote:
Similar to:
commit 8b8e6b5d3b01 ("kallsyms: strip ThinLTO hashes from static
functions")

It's very common for compilers to modify the symbol name for static
functions as part of optimizing transformations. That makes hooking
static functions (that weren't inlined or DCE'd) with kprobes difficult.

Full LTO uses a different mangling scheme than thin LTO; full LTO
imports all code into effectively one big translation unit. It must
rename static functions to prevent collisions. Strip off these suffixes
so that we can continue to hook such static functions.

See below. The message needs a change.

I can comment on the LTO side thing, but a maintainer needs to check
about the kernel side logic.

Reviewed-by: Fangrui Song <maskray@xxxxxxxxxx>

Reported-by: KE.LI(Lieke) <like1@xxxxxxxx>
Tested-by: KE.LI(Lieke) <like1@xxxxxxxx>
Signed-off-by: Nick Desaulniers <ndesaulniers@xxxxxxxxxx>
---
kernel/kallsyms.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)

diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 4067564ec59f..14cf3a6474de 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -188,6 +188,24 @@ static inline bool cleanup_symbol_name(char *s)

return res != NULL;
}
+#elif defined(CONFIG_LTO_CLANG_FULL)
+/*
+ * LLVM mangles static functions for full LTO so that two static functions with
+ * the same identifier do not collide when all code is combined into one
+ * module. The scheme used converts references to foo into
+ * foo.llvm.974640843467629774, for example. This can break hooking of static
+ * functions with kprobes.
+ */

The comment should say ThinLTO instead.

The .llvm.123 suffix is for global scope promotion for local linkage
symbols. The scheme is ThinLTO specific. This ensures that a local
linkage symbol, when imported into multiple translation units, then
compiled into different object files, during linking, the copies can be
deduplicated. This matters for code size and for correctness when the
function address is taken.

Regular LTO (sometimes called full LTO) uses the regular name.\d+
scheme.

+static inline bool cleanup_symbol_name(char *s)
+{
+ char *res;
+
+ res = strstr(s, ".llvm.");
+ if (res)
+ *res = '\0';
+
+ return res != NULL;
+}
#else
static inline bool cleanup_symbol_name(char *s) { return false; }
#endif
--
2.32.0.288.g62a8d224e6-goog

I wonder whether it makes sense to strip all `.something` suffixes.
For example, the recent -funique-internal-linkage-name (which can
improve sample profile accuracy) uses the `.__uniq.1234` scheme.

Function specialization/clones can create arbitrary `.123` suffixes.

--
You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe@xxxxxxxxxxxxxxxx.
To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20210622183858.2962637-1-ndesaulniers%40google.com.