Re: [PATCH] ARM: vfp: Use asm volatile in fmrx/fmxr macros

From: Calvin Owens
Date: Sun Jul 28 2024 - 17:29:58 EST


On Sunday 07/28 at 19:09 +0200, Ard Biesheuvel wrote:
> (cc Arnd, Nathan, zhuqiuer)
>
> On Sun, 28 Jul 2024 at 10:21, Calvin Owens <calvin@xxxxxxxxxx> wrote:
> > <snip>
> >
> > Crudely grepping for vmsr/vmrs instructions in the otherwise nearly
> > idential text for vfp_support_entry() makes the problem obvious:
> >
> > vmlinux.llvm.good [0xc0101cb8] <+48>: vmrs r7, fpexc
> > vmlinux.llvm.good [0xc0101cd8] <+80>: vmsr fpexc, r0
> > vmlinux.llvm.good [0xc0101d20] <+152>: vmsr fpexc, r7
> > vmlinux.llvm.good [0xc0101d38] <+176>: vmrs r4, fpexc
> > vmlinux.llvm.good [0xc0101d6c] <+228>: vmrs r0, fpscr
> > vmlinux.llvm.good [0xc0101dc4] <+316>: vmsr fpexc, r0
> > vmlinux.llvm.good [0xc0101dc8] <+320>: vmrs r0, fpsid
> > vmlinux.llvm.good [0xc0101dcc] <+324>: vmrs r6, fpscr
> > vmlinux.llvm.good [0xc0101e10] <+392>: vmrs r10, fpinst
> > vmlinux.llvm.good [0xc0101eb8] <+560>: vmrs r10, fpinst2
> >
> > vmlinux.llvm.bad [0xc0101cb8] <+48>: vmrs r7, fpexc
> > vmlinux.llvm.bad [0xc0101cd8] <+80>: vmsr fpexc, r0
> > vmlinux.llvm.bad [0xc0101d20] <+152>: vmsr fpexc, r7
> > vmlinux.llvm.bad [0xc0101d30] <+168>: vmrs r0, fpscr
> > vmlinux.llvm.bad [0xc0101d50] <+200>: vmrs r6, fpscr <== BOOM!
> > vmlinux.llvm.bad [0xc0101d6c] <+228>: vmsr fpexc, r0
> > vmlinux.llvm.bad [0xc0101d70] <+232>: vmrs r0, fpsid
> > vmlinux.llvm.bad [0xc0101da4] <+284>: vmrs r10, fpinst
> > vmlinux.llvm.bad [0xc0101df8] <+368>: vmrs r4, fpexc
> > vmlinux.llvm.bad [0xc0101e5c] <+468>: vmrs r10, fpinst2
> >
> > I think LLVM's reordering is valid as the code is currently written: the
> > compiler doesn't know the instructions have side effects in hardware.
> >
> > Fix by using "asm volatile" in fmxr() and fmrx(), so they cannot be
> > reordered with respect to each other. The original compiler now produces
> > working kernels on my hardware with DYNAMIC_DEBUG=n.
> >
> > This is the relevant piece of the diff of the vfp_support_entry() text,
> > from the original oopsing kernel to a working kernel with this patch:
> >
> > vmrs r0, fpscr
> > tst r0, #4096
> > bne 0xc0101d48
> > tst r0, #458752
> > beq 0xc0101ecc
> > orr r7, r7, #536870912
> > ldr r0, [r4, #0x3c]
> > mov r9, #16
> > -vmrs r6, fpscr
> > orr r9, r9, #251658240
> > add r0, r0, #4
> > str r0, [r4, #0x3c]
> > mvn r0, #159
> > sub r0, r0, #-1207959552
> > and r0, r7, r0
> > vmsr fpexc, r0
> > vmrs r0, fpsid
> > +vmrs r6, fpscr
> > and r0, r0, #983040
> > cmp r0, #65536
> > bne 0xc0101d88
> >
> > Fixes: 4708fb041346 ("ARM: vfp: Reimplement VFP exception entry in C code")
> > Signed-off-by: Calvin Owens <calvin@xxxxxxxxxx>
> > ---
> > arch/arm/vfp/vfpinstr.h | 48 ++++++++++++++++++++++-------------------
> > 1 file changed, 26 insertions(+), 22 deletions(-)
> >
>
> Thanks for the patch, and for the excellent analysis.
>
> Note that this fix has been proposed in the past, as well as another
> one addressing the same issue, but we've been incredibly sloppy
> getting it merged.
>
> https://lore.kernel.org/linux-arm-kernel/20240410024126.21589-2-zhuqiuer1@xxxxxxxxxx/
> https://lore.kernel.org/linux-arm-kernel/20240318093004.117153-2-ardb+git@xxxxxxxxxx/

Ah sorry for missing that, I searched for the symptom not the cure.

> What both of us appear to have missed is that there are two versions
> of these routines, which should either be dropped (as they are
> obsolete now that the minimum binutils version is 2.25) or fixed up as
> well, as you do below.
>
> Anyone have any thoughts on using a memory clobber as opposed to
> volatile? I think volatile means that the access cannot be elided, but
> it is unclear to me whether that implies any ordering. A 'memory'
> clobber implies that globally visible state is updated, which seems
> like a stronger guarantee to me.

My thinking was that if 'asm volatile' is sufficient, it will suppress
less optimization than the clobber, since the clobber might force the
compiler to assume unrelated memory could have been modified when it
really never is. But I'm not sure about that.

Out of curiousity, I tried it both ways with the same compiler just now,
the only tiny difference in the emitted vfp_support_entry() is here:

--- /volatile 2024-07-28 13:28:59.890505404 -0700
+++ /memclobber 2024-07-28 13:28:59.890505404 -0700
str r0, [r5, #0x4]
vmrs r7, fpexc
tst r7, #1073741824
bne 0xc0101d28
orr r7, r7, #1073741824
bic r0, r7, #-2147483648
vmsr fpexc, r0
+ldr r8, [pc, #0x26c]
ldr r0, [r5, #0x8]
-ldr r8, [pc, #0x268]
add r6, r5, #224
ldr r0, [r8, r0, lsl #2]
cmp r0, r6
beq 0xc0101d18

Thanks,
Calvin

> In any case, let's work together to get /some/ version of this fix merged asap.
>
> > diff --git a/arch/arm/vfp/vfpinstr.h b/arch/arm/vfp/vfpinstr.h
> > index 3c7938fd40aa..32090b0fb250 100644
> > --- a/arch/arm/vfp/vfpinstr.h
> > +++ b/arch/arm/vfp/vfpinstr.h
> > @@ -64,33 +64,37 @@
> >
> > #ifdef CONFIG_AS_VFP_VMRS_FPINST
> >
> > -#define fmrx(_vfp_) ({ \
> > - u32 __v; \
> > - asm(".fpu vfpv2\n" \
> > - "vmrs %0, " #_vfp_ \
> > - : "=r" (__v) : : "cc"); \
> > - __v; \
> > - })
> > -
> > -#define fmxr(_vfp_,_var_) \
> > - asm(".fpu vfpv2\n" \
> > - "vmsr " #_vfp_ ", %0" \
> > - : : "r" (_var_) : "cc")
> > +#define fmrx(_vfp_) ({ \
> > + u32 __v; \
> > + asm volatile (".fpu vfpv2\n" \
> > + "vmrs %0, " #_vfp_ \
> > + : "=r" (__v) : : "cc"); \
> > + __v; \
> > +})
> > +
> > +#define fmxr(_vfp_, _var_) ({ \
> > + asm volatile (".fpu vfpv2\n" \
> > + "vmsr " #_vfp_ ", %0" \
> > + : : "r" (_var_) : "cc"); \
> > +})
> >
> > #else
> >
> > #define vfpreg(_vfp_) #_vfp_
> >
> > -#define fmrx(_vfp_) ({ \
> > - u32 __v; \
> > - asm("mrc p10, 7, %0, " vfpreg(_vfp_) ", cr0, 0 @ fmrx %0, " #_vfp_ \
> > - : "=r" (__v) : : "cc"); \
> > - __v; \
> > - })
> > -
> > -#define fmxr(_vfp_,_var_) \
> > - asm("mcr p10, 7, %0, " vfpreg(_vfp_) ", cr0, 0 @ fmxr " #_vfp_ ", %0" \
> > - : : "r" (_var_) : "cc")
> > +#define fmrx(_vfp_) ({ \
> > + u32 __v; \
> > + asm volatile ("mrc p10, 7, %0, " vfpreg(_vfp_) "," \
> > + "cr0, 0 @ fmrx %0, " #_vfp_ \
> > + : "=r" (__v) : : "cc"); \
> > + __v; \
> > +})
> > +
> > +#define fmxr(_vfp_, _var_) ({ \
> > + asm volatile ("mcr p10, 7, %0, " vfpreg(_vfp_) "," \
> > + "cr0, 0 @ fmxr " #_vfp_ ", %0" \
> > + : : "r" (_var_) : "cc"); \
> > +})
> >
> > #endif
> >
> > --
> > 2.39.2
> >