Re: [PATCH -tip v3 1/2] kcov: Make runtime functions noinstr-compatible

From: Peter Zijlstra
Date: Wed Jun 17 2020 - 10:50:05 EST


On Wed, Jun 17, 2020 at 04:32:08PM +0200, Marco Elver wrote:
> On Mon, Jun 15, 2020 at 05:20PM +0200, Peter Zijlstra wrote:
> > On Mon, Jun 15, 2020 at 05:03:27PM +0200, Peter Zijlstra wrote:
> >
> > > Yes, I think so. x86_64 needs lib/memcpy_64.S in .noinstr.text then. For
> > > i386 it's an __always_inline inline-asm thing.
> >
> > Bah, I tried writing it without memcpy, but clang inserts memcpy anyway
> > :/
>
> Hmm, __builtin_memcpy() won't help either.
>
> Turns out, Clang 11 got __builtin_memcpy_inline(): https://reviews.llvm.org/D73543
>
> The below works, no more crash on either KASAN or KCSAN with Clang. We
> can test if we have it with __has_feature(__builtin_memcpy_inline)
> (although that's currently not working as expected, trying to fix :-/).
>
> Would a memcpy_inline() be generally useful? It's not just Clang but
> also GCC that isn't entirely upfront about which memcpy is inlined and
> which isn't. If the compiler has __builtin_memcpy_inline(), we can use
> it, otherwise the arch likely has to provide the implementation.
>
> Thoughts?

I had the below, except of course that yields another objtool
complaint, and I was still looking at that.

Does GCC (8, as per the new KASAN thing) have that
__builtin_memcpy_inline() ?

---
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index af75109485c26..a7d1570905727 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -690,13 +690,13 @@ struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s)
(struct bad_iret_stack *)__this_cpu_read(cpu_tss_rw.x86_tss.sp0) - 1;

/* Copy the IRET target to the temporary storage. */
- memcpy(&tmp.regs.ip, (void *)s->regs.sp, 5*8);
+ __memcpy(&tmp.regs.ip, (void *)s->regs.sp, 5*8);

/* Copy the remainder of the stack from the current stack. */
- memcpy(&tmp, s, offsetof(struct bad_iret_stack, regs.ip));
+ __memcpy(&tmp, s, offsetof(struct bad_iret_stack, regs.ip));

/* Update the entry stack */
- memcpy(new_stack, &tmp, sizeof(tmp));
+ __memcpy(new_stack, &tmp, sizeof(tmp));

BUG_ON(!user_mode(&new_stack->regs));
return new_stack;
diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S
index 56b243b14c3a2..bbcc05bcefadb 100644
--- a/arch/x86/lib/memcpy_64.S
+++ b/arch/x86/lib/memcpy_64.S
@@ -8,6 +8,8 @@
#include <asm/alternative-asm.h>
#include <asm/export.h>

+.pushsection .noinstr.text, "ax"
+
/*
* We build a jump to memcpy_orig by default which gets NOPped out on
* the majority of x86 CPUs which set REP_GOOD. In addition, CPUs which
@@ -184,6 +186,8 @@ SYM_FUNC_START_LOCAL(memcpy_orig)
retq
SYM_FUNC_END(memcpy_orig)

+.popsection
+
#ifndef CONFIG_UML

MCSAFE_TEST_CTL