Re: [PATCH 2/2] ARM: futex: make futex_detect_cmpxchg more reliable
From: Ard Biesheuvel
Date: Fri Mar 08 2019 - 03:58:02 EST
On Fri, 8 Mar 2019 at 00:49, Russell King - ARM Linux admin
<linux@xxxxxxxxxxxxxxx> wrote:
>
> On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote:
> > On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <arnd@xxxxxxxx> wrote:
> > >
> > > Passing registers containing zero as both the address (NULL pointer)
> > > and data into cmpxchg_futex_value_locked() leads clang to assign
> > > the same register for both inputs on ARM, which triggers a warning
> > > explaining that this instruction has unpredictable behavior on ARMv5.
> > >
> > > /tmp/futex-7e740e.s: Assembler messages:
> > > /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base
> > >
> > > This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4,
> > > as Mikael wrote:
> > > "One way of fixing this is to make uaddr an input/output register, since
> > > "that prevents it from overlapping any other input or output."
> > >
> > > but then withdrawn as the warning was determined to be harmless, and it
> > > apparently never showed up again with later gcc versions.
> > >
> > > Now the same problem is back when compiling with clang, and we are trying
> > > to get clang to build the kernel without warnings, as gcc normally does.
> > >
> > > Cc: Mikael Pettersson <mikpe@xxxxxxxx>
> > > Cc: Mikael Pettersson <mikpelinux@xxxxxxxxx>
> > > Cc: Dave Martin <Dave.Martin@xxxxxxx>
> > > Link: https://lore.kernel.org/linux-arm-kernel/20009.45690.158286.161591@xxxxxxxxxxxxxxxxxxx/
> > > Signed-off-by: Arnd Bergmann <arnd@xxxxxxxx>
> > > ---
> > > arch/arm/include/asm/futex.h | 10 +++++-----
> > > 1 file changed, 5 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h
> > > index 0a46676b4245..79790912974e 100644
> > > --- a/arch/arm/include/asm/futex.h
> > > +++ b/arch/arm/include/asm/futex.h
> > > @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
> > > preempt_disable();
> > > __ua_flags = uaccess_save_and_enable();
> > > __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"
> > > - "1: " TUSER(ldr) " %1, [%4]\n"
> > > - " teq %1, %2\n"
> > > + "1: " TUSER(ldr) " %1, [%2]\n"
> > > + " teq %1, %3\n"
> > > " it eq @ explicit IT needed for the 2b label\n"
> > > - "2: " TUSER(streq) " %3, [%4]\n"
> > > + "2: " TUSER(streq) " %4, [%2]\n"
> > > __futex_atomic_ex_table("%5")
> > > - : "+r" (ret), "=&r" (val)
> > > - : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT)
> > > + : "+&r" (ret), "=&r" (val), "+&r" (uaddr)
> > > + : "r" (oldval), "r" (newval), "Ir" (-EFAULT)
> > > : "cc", "memory");
> > > uaccess_restore(__ua_flags);
> >
> > Underspecification of constraints to extended inline assembly is a
> > common issue exposed by other compilers (and possibly but in-effect
> > infrequently compiler upgrades).
> > So the reordering of the constraints means the in the assembly (notes
> > for other reviewers):
> > %2 -> %3
> > %3 -> %4
> > %4 -> %2
> > Yep, looks good to me, thanks for finding this old patch and resending, Arnd!
>
> I don't see what is "underspecified" in the original constraints.
> Please explain.
>
I agree that that statement makes little sense.
As Russell points out in the referenced thread, there is nothing wrong
with the generated assembly, given that the UNPREDICTABLE opcode is
unreachable in practice. Unfortunately, we have no way to flag this
diagnostic as a known false positive, and AFAICT, there is no reason
we couldn't end up with the same diagnostic popping up for GCC builds
in the future, considering that the register assignment matches the
constraints. (We have seen somewhat similar issues where constant
folded function clones are emitted with a constant argument that could
never occur in reality [0])
Given the above, the only meaningful way to invoke this function is
with different registers assigned to %3 and %4, and so tightening the
constraints to guarantee that does not actually result in worse code
(except maybe for the instantiations that we won't ever call in the
first place). So I think we should fix this.
I wonder if just adding
BUG_ON(__builtin_constant_p(uaddr));
at the beginning makes any difference - this shouldn't result in any
object code differences since the conditional will always evaluate to
false at build time for instantiations we care about.
[0] https://lore.kernel.org/lkml/9c74d635-d0d1-0893-8093-ce20b0933fc7@xxxxxxxxxx/