Re: [PATCH v4 2/7] arm64: introduce interfaces to hotpatch kerneland module code

From: Will Deacon
Date: Thu Oct 17 2013 - 07:39:09 EST


[adding Tixy for stop_machine() question below]

On Thu, Oct 17, 2013 at 07:19:35AM +0100, Jiang Liu wrote:
> From: Jiang Liu <jiang.liu@xxxxxxxxxx>
>
> Introduce three interfaces to patch kernel and module code:
> aarch64_insn_patch_text_nosync():
> patch code without synchronization, it's caller's responsibility
> to synchronize all CPUs if needed.
> aarch64_insn_patch_text_sync():
> patch code and always synchronize with stop_machine()
> aarch64_insn_patch_text():
> patch code and synchronize with stop_machine() if needed
>
> Signed-off-by: Jiang Liu <jiang.liu@xxxxxxxxxx>
> Cc: Jiang Liu <liuj97@xxxxxxxxx>
> ---
> arch/arm64/include/asm/insn.h | 24 +++++++++++++-
> arch/arm64/kernel/insn.c | 77 +++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 100 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
> index 6190016..fc439b9 100644
> --- a/arch/arm64/include/asm/insn.h
> +++ b/arch/arm64/include/asm/insn.h
> @@ -60,8 +60,30 @@ __AARCH64_INSN_FUNCS(nop, 0xFFFFFFFF, 0xD503201F)
>
> #undef __AARCH64_INSN_FUNCS
>
> -enum aarch64_insn_encoding_class aarch64_get_insn_class(u32 insn);
> +/*
> + * In ARMv8-A, A64 instructions have a fixed length of 32 bits and are always
> + * little-endian. On the other hand, SCTLR_EL1.EE (bit 25, Exception Endianness)
> + * flag controls endianness for EL1 explicit data accesses and stage 1
> + * translation table walks as below:
> + * 0: little-endian
> + * 1: big-endian
> + * So need to handle endianness when patching kernel code.
> + */

You can delete this comment now that we're using the helpers...

> +static __always_inline u32 aarch64_insn_read(void *addr)
> +{
> + return le32_to_cpu(*(u32 *)addr);
> +}
>
> +static __always_inline void aarch64_insn_write(void *addr, u32 insn)
> +{
> + *(u32 *)addr = cpu_to_le32(insn);
> +}

... then just inline these calls directly.

> +enum aarch64_insn_encoding_class aarch64_get_insn_class(u32 insn);
> bool aarch64_insn_hotpatch_safe(u32 old_insn, u32 new_insn);
>
> +int aarch64_insn_patch_text_nosync(void *addrs[], u32 insns[], int cnt);
> +int aarch64_insn_patch_text_sync(void *addrs[], u32 insns[], int cnt);
> +int aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt);
> +
> #endif /* _ASM_ARM64_INSN_H */
> diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
> index 1c501f3..8dd5fbe 100644
> --- a/arch/arm64/kernel/insn.c
> +++ b/arch/arm64/kernel/insn.c
> @@ -16,6 +16,9 @@
> */
> #include <linux/compiler.h>
> #include <linux/kernel.h>
> +#include <linux/smp.h>
> +#include <linux/stop_machine.h>
> +#include <asm/cacheflush.h>
> #include <asm/insn.h>
>
> static int aarch64_insn_encoding_cls[] = {
> @@ -70,3 +73,77 @@ bool __kprobes aarch64_insn_hotpatch_safe(u32 old_insn, u32 new_insn)
> return __aarch64_insn_hotpatch_safe(old_insn) &&
> __aarch64_insn_hotpatch_safe(new_insn);
> }
> +
> +int __kprobes aarch64_insn_patch_text_nosync(void *addrs[], u32 insns[],
> + int cnt)
> +{
> + int i;
> + u32 *tp;
> +
> + if (cnt <= 0)
> + return -EINVAL;

Isn't cnt always 1 for the _nosync patching? Can you just drop the argument
and simplify this code? Patching a sequence without syncing is always racy.

> + for (i = 0; i < cnt; i++) {
> + tp = addrs[i];
> + /* A64 instructions must be word aligned */
> + if ((uintptr_t)tp & 0x3)
> + return -EINVAL;
> + aarch64_insn_write(tp, insns[i]);
> + flush_icache_range((uintptr_t)tp, (uintptr_t)tp + sizeof(u32));
> + }
> +
> + return 0;
> +}
> +
> +struct aarch64_insn_patch {
> + void **text_addrs;
> + u32 *new_insns;
> + int insn_cnt;
> +};
> +
> +static int __kprobes aarch64_insn_patch_text_cb(void *arg)
> +{
> + struct aarch64_insn_patch *pp = arg;
> +
> + return aarch64_insn_patch_text_nosync(pp->text_addrs, pp->new_insns,
> + pp->insn_cnt);
> +}
> +
> +int __kprobes aarch64_insn_patch_text_sync(void *addrs[], u32 insns[], int cnt)
> +{
> + struct aarch64_insn_patch patch = {
> + .text_addrs = addrs,
> + .new_insns = insns,
> + .insn_cnt = cnt,
> + };
> +
> + if (cnt <= 0)
> + return -EINVAL;
> +
> + /*
> + * Execute __aarch64_insn_patch_text() on every online CPU,
> + * which ensure serialization among all online CPUs.
> + */
> + return stop_machine(aarch64_insn_patch_text_cb, &patch, NULL);
> +}

Whoa, whoa, whoa! The comment here is wrong -- we only run the patching on
*one* CPU, which is the right thing to do. However, the arch/arm/ call to
stop_machine in kprobes does actually run the patching code on *all* the
online cores (including the cache flushing!). I think this is to work around
cores without hardware cache maintenance broadcasting, but that could easily
be called out specially (like we do in patch.c) and the flushing could be
separated from the patching too.

> +int __kprobes aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt)
> +{
> + int ret;
> +
> + if (cnt == 1 && aarch64_insn_hotpatch_safe(aarch64_insn_read(addrs[0]),
> + insns[0])) {

You could make aarch64_insn_hotpatch_safe take the cnt parameter and return
false if cnt != 1.

> + /*
> + * It doesn't guarantee all CPUs see the new instruction

"It"? You mean the ARMv8 architecture?

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/