Re: [PATCH] ARM: vfp: Use asm volatile in fmrx/fmxr macros

From: Calvin Owens
Date: Mon Jul 29 2024 - 12:07:46 EST


On Monday 07/29 at 09:12 +0200, Ard Biesheuvel wrote:
> On Sun, 28 Jul 2024 at 23:29, Calvin Owens <calvin@xxxxxxxxxx> wrote:
> >
> > On Sunday 07/28 at 19:09 +0200, Ard Biesheuvel wrote:
> > > (cc Arnd, Nathan, zhuqiuer)
> > >
> > > On Sun, 28 Jul 2024 at 10:21, Calvin Owens <calvin@xxxxxxxxxx> wrote:
> > > > <snip>
> > > >
> > > > Crudely grepping for vmsr/vmrs instructions in the otherwise nearly
> > > > idential text for vfp_support_entry() makes the problem obvious:
> > > >
> > > > vmlinux.llvm.good [0xc0101cb8] <+48>: vmrs r7, fpexc
> > > > vmlinux.llvm.good [0xc0101cd8] <+80>: vmsr fpexc, r0
> > > > vmlinux.llvm.good [0xc0101d20] <+152>: vmsr fpexc, r7
> > > > vmlinux.llvm.good [0xc0101d38] <+176>: vmrs r4, fpexc
> > > > vmlinux.llvm.good [0xc0101d6c] <+228>: vmrs r0, fpscr
> > > > vmlinux.llvm.good [0xc0101dc4] <+316>: vmsr fpexc, r0
> > > > vmlinux.llvm.good [0xc0101dc8] <+320>: vmrs r0, fpsid
> > > > vmlinux.llvm.good [0xc0101dcc] <+324>: vmrs r6, fpscr
> > > > vmlinux.llvm.good [0xc0101e10] <+392>: vmrs r10, fpinst
> > > > vmlinux.llvm.good [0xc0101eb8] <+560>: vmrs r10, fpinst2
> > > >
> > > > vmlinux.llvm.bad [0xc0101cb8] <+48>: vmrs r7, fpexc
> > > > vmlinux.llvm.bad [0xc0101cd8] <+80>: vmsr fpexc, r0
> > > > vmlinux.llvm.bad [0xc0101d20] <+152>: vmsr fpexc, r7
> > > > vmlinux.llvm.bad [0xc0101d30] <+168>: vmrs r0, fpscr
> > > > vmlinux.llvm.bad [0xc0101d50] <+200>: vmrs r6, fpscr <== BOOM!
> > > > vmlinux.llvm.bad [0xc0101d6c] <+228>: vmsr fpexc, r0
> > > > vmlinux.llvm.bad [0xc0101d70] <+232>: vmrs r0, fpsid
> > > > vmlinux.llvm.bad [0xc0101da4] <+284>: vmrs r10, fpinst
> > > > vmlinux.llvm.bad [0xc0101df8] <+368>: vmrs r4, fpexc
> > > > vmlinux.llvm.bad [0xc0101e5c] <+468>: vmrs r10, fpinst2
> > > >
> > > > I think LLVM's reordering is valid as the code is currently written: the
> > > > compiler doesn't know the instructions have side effects in hardware.
> > > >
> > > > Fix by using "asm volatile" in fmxr() and fmrx(), so they cannot be
> > > > reordered with respect to each other. The original compiler now produces
> > > > working kernels on my hardware with DYNAMIC_DEBUG=n.
> > > >
> > > > This is the relevant piece of the diff of the vfp_support_entry() text,
> > > > from the original oopsing kernel to a working kernel with this patch:
> > > >
> > > > vmrs r0, fpscr
> > > > tst r0, #4096
> > > > bne 0xc0101d48
> > > > tst r0, #458752
> > > > beq 0xc0101ecc
> > > > orr r7, r7, #536870912
> > > > ldr r0, [r4, #0x3c]
> > > > mov r9, #16
> > > > -vmrs r6, fpscr
> > > > orr r9, r9, #251658240
> > > > add r0, r0, #4
> > > > str r0, [r4, #0x3c]
> > > > mvn r0, #159
> > > > sub r0, r0, #-1207959552
> > > > and r0, r7, r0
> > > > vmsr fpexc, r0
> > > > vmrs r0, fpsid
> > > > +vmrs r6, fpscr
> > > > and r0, r0, #983040
> > > > cmp r0, #65536
> > > > bne 0xc0101d88
> > > >
> > > > Fixes: 4708fb041346 ("ARM: vfp: Reimplement VFP exception entry in C code")
> > > > Signed-off-by: Calvin Owens <calvin@xxxxxxxxxx>
> > > > ---
> > > > arch/arm/vfp/vfpinstr.h | 48 ++++++++++++++++++++++-------------------
> > > > 1 file changed, 26 insertions(+), 22 deletions(-)
> > > >
> > >
> > > Thanks for the patch, and for the excellent analysis.
> > >
> > > Note that this fix has been proposed in the past, as well as another
> > > one addressing the same issue, but we've been incredibly sloppy
> > > getting it merged.
> > >
> > > https://lore.kernel.org/linux-arm-kernel/20240410024126.21589-2-zhuqiuer1@xxxxxxxxxx/
> > > https://lore.kernel.org/linux-arm-kernel/20240318093004.117153-2-ardb+git@xxxxxxxxxx/
> >
> > Ah sorry for missing that, I searched for the symptom not the cure.
> >
> > > What both of us appear to have missed is that there are two versions
> > > of these routines, which should either be dropped (as they are
> > > obsolete now that the minimum binutils version is 2.25) or fixed up as
> > > well, as you do below.
> > >
> > > Anyone have any thoughts on using a memory clobber as opposed to
> > > volatile? I think volatile means that the access cannot be elided, but
> > > it is unclear to me whether that implies any ordering. A 'memory'
> > > clobber implies that globally visible state is updated, which seems
> > > like a stronger guarantee to me.
> >
> > My thinking was that if 'asm volatile' is sufficient, it will suppress
> > less optimization than the clobber, since the clobber might force the
> > compiler to assume unrelated memory could have been modified when it
> > really never is. But I'm not sure about that.
> >
> > Out of curiousity, I tried it both ways with the same compiler just now,
> > the only tiny difference in the emitted vfp_support_entry() is here:
> >
> > --- /volatile 2024-07-28 13:28:59.890505404 -0700
> > +++ /memclobber 2024-07-28 13:28:59.890505404 -0700
> > str r0, [r5, #0x4]
> > vmrs r7, fpexc
> > tst r7, #1073741824
> > bne 0xc0101d28
> > orr r7, r7, #1073741824
> > bic r0, r7, #-2147483648
> > vmsr fpexc, r0
> > +ldr r8, [pc, #0x26c]
> > ldr r0, [r5, #0x8]
> > -ldr r8, [pc, #0x268]
> > add r6, r5, #224
> > ldr r0, [r8, r0, lsl #2]
> > cmp r0, r6
> > beq 0xc0101d18
> >
>
> Right. So it doesn't matter in practice - good to know.
>
> So in the 'volatile' case, I guess we are relying on the compiler not
> reordering those with respect to each other, which should be
> sufficient to ensure that we do not access VFP status register before
> enabling the VFP unit via the control register, as all are accessed
> using inline asm.
>
> In any case, this should go into the patch system.
> https://www.armlinux.org.uk/developer/patches/section.php?section=0

Done:
https://www.armlinux.org.uk/developer/patches/viewpatch.php?id=9410/1

Thanks,
Calvin

> Note to self and other: follow-up with a patch that removes
> CONFIG_AS_VFP_VMRS_FPINST, which is no longer needed.