Re: [PATCH] ARM: Add a memory clobber to the fmrx instruction

From: Nathan Chancellor
Date: Tue Apr 09 2024 - 12:46:52 EST


+ Ard

On Tue, Apr 09, 2024 at 07:38:44PM +0800, zhuqiuer wrote:
> The instruction fmrx is used throughout the kernel,
> where it is sometimes expected to be skipped
> by incrementing the program counter, such as in vfpmodule.c:vfp_init().
> Therefore, the instruction should not be reordered when it is not intended.
> Adding a barrier() instruction before and after this call cannot prevent
> reordering by the compiler, as the fmrx instruction is constrained
> by '=r', meaning it works on the general register but not on memory.
> To ensure the order of the instruction after compiling,
> adding a memory clobber is necessary.
>
> Below is the code snippet disassembled from the method:
> vfpmodule.c:vfp_init(), compiled by LLVM.
>
> Before the patching:
> xxxxx: xxxxx bl c010c688 <register_undef_hook>
> xxxxx: xxxxx mov r0, r4
> xxxxx: xxxxx bl c010c6e4 <unregister_undef_hook>
> ...
> xxxxx: xxxxx bl c0791c8c <printk>
> xxxxx: xxxxx movw r5, #23132 ; 0x5a5c
> xxxxx: xxxxx vmrs r4, fpsid <- this is the fmrx instruction
>
> After the patching:
> xxxxx: xxxxx bl c010c688 <register_undef_hook>
> xxxxx: xxxxx mov r0, r4
> xxxxx: xxxxx vmrs r5, fpsid <- this is the fmrx instruction
> xxxxx: xxxxx bl c010c6e4 <unregister_undef_hook>
>
> Signed-off-by: zhuqiuer <zhuqiuer1@xxxxxxxxxx>
> ---
> arch/arm/vfp/vfpinstr.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/arm/vfp/vfpinstr.h b/arch/arm/vfp/vfpinstr.h
> index 3c7938fd40aa..e70129e10b8e 100644
> --- a/arch/arm/vfp/vfpinstr.h
> +++ b/arch/arm/vfp/vfpinstr.h
> @@ -68,7 +68,7 @@
> u32 __v; \
> asm(".fpu vfpv2\n" \
> "vmrs %0, " #_vfp_ \
> - : "=r" (__v) : : "cc"); \
> + : "=r" (__v) : : "memory", "cc"); \
> __v; \
> })
>
> --
> 2.12.3
>

This seems like the same issue that Ard was addressing with this patch
at https://lore.kernel.org/20240318093004.117153-2-ardb+git@xxxxxxxxxx/,
does that change work for your situation as well? I do not really have a
strong preference between the two approaches, Ard also mentioned using
*current in the asm constraints as another option.

Cheers,
Nathan