Re: [PATCH V4 05/13] perf/x86: Add perf text poke events for kprobes

From: Adrian Hunter
Date: Thu Mar 26 2020 - 03:43:09 EST


On 26/03/20 3:58 am, Masami Hiramatsu wrote:
> On Tue, 24 Mar 2020 13:21:50 +0100
> Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
>> We optimize only after having already installed a regular probe, that
>> is, what we're actually doing here is replacing INT3 with a JMP.d32. But
>> the above will make it appear as if we're replacing the original text
>> with a JMP.d32. Which doesn't make sense, since we've already poked an
>> INT3 there and that poke will have had a corresponding
>> perf_event_text_poke(), right? (except you didn't, see below)
>>
>> At this point we'll already have constructed the optprobe trampoline,
>> which contains however much of the original instruction (in whole) as
>> will be overwritten by our 5 byte JMP.d32. And IIUC, we'll have a
>> perf_event_text_poke() event for the whole of that already -- except I
>> can't find that in the patches (again, see below).
>
> Thanks Peter to point it out.
>
>>
>>> @@ -454,9 +463,16 @@ void arch_optimize_kprobes(struct list_head *oplist)
>>> */
>>> void arch_unoptimize_kprobe(struct optimized_kprobe *op)
>>> {
>>> + u8 old[POKE_MAX_OPCODE_SIZE];
>>> + u8 new[POKE_MAX_OPCODE_SIZE] = { op->kp.opcode, };
>>> + size_t len = INT3_INSN_SIZE + DISP32_SIZE;
>>> +
>>> + memcpy(old, op->kp.addr, len);
>>> arch_arm_kprobe(&op->kp);
>>> text_poke(op->kp.addr + INT3_INSN_SIZE,
>>> op->optinsn.copied_insn, DISP32_SIZE);
>>> + memcpy(new + INT3_INSN_SIZE, op->optinsn.copied_insn, DISP32_SIZE);
>>
>> And then this is 'wrong' too. You've not written the original
>> instruction, you've just written an INT3.
>>
>>> + perf_event_text_poke(op->kp.addr, old, len, new, len);
>>> text_poke_sync();
>>> }
>>
>>
>> So how about something like the below, with it you'll get 6 text_poke
>> events:
>>
>> 1: old0 -> INT3
>>
>> // kprobe active
>>
>> 2: NULL -> optprobe_trampoline
>> 3: INT3,old1,old2,old3,old4 -> JMP32
>>
>> // optprobe active
>>
>> 4: JMP32 -> INT3,old1,old2,old3,old4
>> 5: optprobe_trampoline -> NULL
>>
>> // kprobe active
>>
>> 6: INT3 -> old0
>>
>>
>>
>> Masami, did I get this all right?
>
> Yes, you understand correctly. And there is also boosted kprobe
> which runs probe.ainsn.insn directly and jump back to old place.
> I guess it will also disturb Intel PT.
>
> 0: NULL -> probe.ainsn.insn (if ainsn.boostable && !kp.post_handler)
>
>> 1: old0 -> INT3
>>
> // boosted kprobe active
>>
>> 2: NULL -> optprobe_trampoline
>> 3: INT3,old1,old2,old3,old4 -> JMP32
>>
>> // optprobe active
>>
>> 4: JMP32 -> INT3,old1,old2,old3,old4
>
> // optprobe disabled and kprobe active (this sometimes goes back to 3)
>
>> 5: optprobe_trampoline -> NULL
>>
> // boosted kprobe active
>>
>> 6: INT3 -> old0
>
> 7: probe.ainsn.insn -> NULL (if ainsn.boostable && !kp.post_handler)
>
> So you'll get 8 events in max.
>
> Adrian, would you also need to trace the buffer which is used for
> single stepping? If so, as you did, we need to trace p->ainsn.insn
> always.

Peter's simplification (thanks for that I will test it) didn't look
at that aspect but it was covered in the original patch in the chunk
below. That will be included in the next version also.

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 579d30e91a36..12ea05d923ec 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -33,6 +33,7 @@
#include <linux/hardirq.h>
#include <linux/preempt.h>
#include <linux/sched/debug.h>
+#include <linux/perf_event.h>
#include <linux/extable.h>
#include <linux/kdebug.h>
#include <linux/kallsyms.h>
@@ -470,6 +471,9 @@ static int arch_copy_kprobe(struct kprobe *p)
/* Also, displacement change doesn't affect the first byte */
p->opcode = buf[0];

+ p->ainsn.tp_len = len;
+ perf_event_text_poke(p->ainsn.insn, NULL, 0, buf, len);
+
/* OK, write back the instruction(s) into ROX insn buffer */
text_poke(p->ainsn.insn, buf, len);

@@ -514,6 +518,9 @@ void arch_disarm_kprobe(struct kprobe *p)
void arch_remove_kprobe(struct kprobe *p)
{
if (p->ainsn.insn) {
+ /* Record the perf event before freeing the slot */
+ perf_event_text_poke(p->ainsn.insn, p->ainsn.insn,
+ p->ainsn.tp_len, NULL, 0);
free_insn_slot(p->ainsn.insn, p->ainsn.boostable);
p->ainsn.insn = NULL;
}