Re: [PATCH v2] arm64: insn: Simulate nop instruction for better uprobe performance

From: Mark Rutland
Date: Thu Oct 10 2024 - 06:59:00 EST


Hi Andrii,

On Wed, Oct 09, 2024 at 04:54:25PM -0700, Andrii Nakryiko wrote:
> On Mon, Sep 9, 2024 at 12:21 AM Liao Chang <liaochang1@xxxxxxxxxx> wrote:

> I'm curious what's the status of this patch? It received no comments
> so far in the last month. Can someone on the ARM64 side of things
> please take a look? (or maybe it was applied to some tree and there
> was just no notification?)
>
> This is a very useful performance optimization for uprobe tracing on
> ARM64, so would be nice to get it in during current release cycle.
> Thank you!

Sorry, I got busy chasing up a bunch of bugs and hadn't gotten round to
this yet.

I've replied with a couple of minor comments and an ack, and I reckon we
can queue this up this cycle. Usually this sort of thing starts to get
queued around -rc3.

Mark.

>
> > diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
> > index 8c0a36f72d6f..dd530d5c3d67 100644
> > --- a/arch/arm64/include/asm/insn.h
> > +++ b/arch/arm64/include/asm/insn.h
> > @@ -549,6 +549,12 @@ static __always_inline bool aarch64_insn_uses_literal(u32 insn)
> > aarch64_insn_is_prfm_lit(insn);
> > }
> >
> > +static __always_inline bool aarch64_insn_is_nop(u32 insn)
> > +{
> > + return aarch64_insn_is_hint(insn) &&
> > + ((insn & 0xFE0) == AARCH64_INSN_HINT_NOP);
> > +}
> > +
> > enum aarch64_insn_encoding_class aarch64_get_insn_class(u32 insn);
> > u64 aarch64_insn_decode_immediate(enum aarch64_insn_imm_type type, u32 insn);
> > u32 aarch64_insn_encode_immediate(enum aarch64_insn_imm_type type,
> > diff --git a/arch/arm64/kernel/probes/decode-insn.c b/arch/arm64/kernel/probes/decode-insn.c
> > index 968d5fffe233..be54539e309e 100644
> > --- a/arch/arm64/kernel/probes/decode-insn.c
> > +++ b/arch/arm64/kernel/probes/decode-insn.c
> > @@ -75,6 +75,15 @@ static bool __kprobes aarch64_insn_is_steppable(u32 insn)
> > enum probe_insn __kprobes
> > arm_probe_decode_insn(probe_opcode_t insn, struct arch_probe_insn *api)
> > {
> > + /*
> > + * While 'nop' instruction can execute in the out-of-line slot,
> > + * simulating them in breakpoint handling offers better performance.
> > + */
> > + if (aarch64_insn_is_nop(insn)) {
> > + api->handler = simulate_nop;
> > + return INSN_GOOD_NO_SLOT;
> > + }
> > +
> > /*
> > * Instructions reading or modifying the PC won't work from the XOL
> > * slot.
> > diff --git a/arch/arm64/kernel/probes/simulate-insn.c b/arch/arm64/kernel/probes/simulate-insn.c
> > index 22d0b3252476..5e4f887a074c 100644
> > --- a/arch/arm64/kernel/probes/simulate-insn.c
> > +++ b/arch/arm64/kernel/probes/simulate-insn.c
> > @@ -200,3 +200,14 @@ simulate_ldrsw_literal(u32 opcode, long addr, struct pt_regs *regs)
> >
> > instruction_pointer_set(regs, instruction_pointer(regs) + 4);
> > }
> > +
> > +void __kprobes
> > +simulate_nop(u32 opcode, long addr, struct pt_regs *regs)
> > +{
> > + /*
> > + * Compared to instruction_pointer_set(), it offers better
> > + * compatibility with single-stepping and execution in target
> > + * guarded memory.
> > + */
> > + arm64_skip_faulting_instruction(regs, AARCH64_INSN_SIZE);
> > +}
> > diff --git a/arch/arm64/kernel/probes/simulate-insn.h b/arch/arm64/kernel/probes/simulate-insn.h
> > index e065dc92218e..efb2803ec943 100644
> > --- a/arch/arm64/kernel/probes/simulate-insn.h
> > +++ b/arch/arm64/kernel/probes/simulate-insn.h
> > @@ -16,5 +16,6 @@ void simulate_cbz_cbnz(u32 opcode, long addr, struct pt_regs *regs);
> > void simulate_tbz_tbnz(u32 opcode, long addr, struct pt_regs *regs);
> > void simulate_ldr_literal(u32 opcode, long addr, struct pt_regs *regs);
> > void simulate_ldrsw_literal(u32 opcode, long addr, struct pt_regs *regs);
> > +void simulate_nop(u32 opcode, long addr, struct pt_regs *regs);
> >
> > #endif /* _ARM_KERNEL_KPROBES_SIMULATE_INSN_H */
> > --
> > 2.34.1
> >