Re: [PATCH 08/15] x86/alternatives: Teach text_poke_bp() to emulate instructions

From: Peter Zijlstra
Date: Tue Jun 11 2019 - 08:13:53 EST


On Tue, Jun 11, 2019 at 10:03:07AM +0200, Peter Zijlstra wrote:
> On Fri, Jun 07, 2019 at 11:10:19AM -0700, Andy Lutomirski wrote:

> > I am surely missing some kprobe context, but is it really safe to use
> > this mechanism to replace more than one instruction?
>
> I'm not entirely up-to-scratch here, so Masami, please correct me if I'm
> wrong.
>
> So what happens is that arch_prepare_optimized_kprobe() <-
> copy_optimized_instructions() copies however much of the instruction
> stream is required such that we can overwrite the instruction at @addr
> with a 5 byte jump.
>
> arch_optimize_kprobe() then does the text_poke_bp() that replaces the
> instruction @addr with int3, copies the rel jump address and overwrites
> the int3 with jmp.
>
> And I'm thinking the problem is with something like:
>
> @addr: nop nop nop nop nop
>
> We copy out the nops into the trampoline, overwrite the first nop with
> an INT3, overwrite the remaining nops with the rel addr, but oops,
> another CPU can still be executing one of those NOPs, right?
>
> I'm thinking we could fix this by first writing INT3 into all relevant
> instructions, which is going to be messy, given the current code base.

Maybe not that bad; how's something like this?

(completely untested)

---
arch/x86/kernel/alternative.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 0d57015114e7..8f643dabea72 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -24,6 +24,7 @@
#include <asm/tlbflush.h>
#include <asm/io.h>
#include <asm/fixmap.h>
+#include <asm/insn.h>

int __read_mostly alternatives_patched;

@@ -849,6 +850,7 @@ static void do_sync_core(void *info)

static bool bp_patching_in_progress;
static void *bp_int3_handler, *bp_int3_addr;
+static unsigned int bp_int3_length;

int poke_int3_handler(struct pt_regs *regs)
{
@@ -867,7 +869,11 @@ int poke_int3_handler(struct pt_regs *regs)
if (likely(!bp_patching_in_progress))
return 0;

- if (user_mode(regs) || regs->ip != (unsigned long)bp_int3_addr)
+ if (user_mode(regs))
+ return 0;
+
+ if (regs->ip < (unsigned long)bp_int3_addr ||
+ regs->ip >= (unsigned long)bp_int3_addr + bp_int3_length)
return 0;

/* set up the specified breakpoint handler */
@@ -900,9 +906,12 @@ NOKPROBE_SYMBOL(poke_int3_handler);
void text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
{
unsigned char int3 = 0xcc;
+ void *kaddr = addr;
+ struct insn insn;

bp_int3_handler = handler;
bp_int3_addr = (u8 *)addr + sizeof(int3);
+ bp_int3_length = len - sizeof(int3);
bp_patching_in_progress = true;

lockdep_assert_held(&text_mutex);
@@ -913,7 +922,14 @@ void text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
*/
smp_wmb();

- text_poke(addr, &int3, sizeof(int3));
+ do {
+ kernel_insn_init(&insn, kaddr, MAX_INSN_SIZE);
+ insn_get_length(&insn);
+
+ text_poke(kaddr, &int3, sizeof(int3));
+
+ kaddr += insn.length;
+ } while (kaddr < addr + len);

on_each_cpu(do_sync_core, NULL, 1);