Re: [PATCH] arm: lib: implement aeabi_uldivmod via div64_u64_rem

From: Nick Desaulniers
Date: Mon Oct 10 2022 - 17:23:54 EST


On Sat, Jul 16, 2022 at 2:47 AM Arnd Bergmann <arnd@xxxxxxxxxx> wrote:
>
> On Sat, Jul 16, 2022 at 2:16 AM Nick Desaulniers
> <ndesaulniers@xxxxxxxxxx> wrote:
> >
> > Compilers frequently need to defer 64b division to a libcall with this
> > symbol name. It essentially is div64_u64_rem, just with a different
> > signature. Kernel developers know to call div64_u64_rem, but compilers
> > don't.
> >
> > Link: https://lore.kernel.org/lkml/20220524004156.0000790e@xxxxxxxxxxx/
> > Suggested-by: Gary Guo <gary@xxxxxxxxxxx>
> > Signed-off-by: Nick Desaulniers <ndesaulniers@xxxxxxxxxx>

So the existing division by constant issues went away, and Craig was
able to improve division by double-word constants in LLVM
1. https://reviews.llvm.org/D130862
2. https://reviews.llvm.org/D135541
But we still have one instance left that's not div/rem by constant via
CONFIG_FPE_NWFPE=y that's now blocking Android's compiler upgrade.
https://github.com/ClangBuiltLinux/linux/issues/1666

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/nwfpe/softfloat.c#n2312
Any creative ideas on how to avoid this? Perhaps putting the `aSig -=
bSig;` in inline asm? Inserting a `barrier()` or empty asm statement
into the loops also seems to work.

Otherwise I'd define __aeabi_uldivmod as static in
arch/arm/nwfpe/softfloat.c (with the body from this patch) only for
clang.

I see this function seems to be based on Berkeley Softfloat
http://www.jhauser.us/arithmetic/SoftFloat.html
v2. v3 looks like a total rewrite. Looking at v3e, it looks like
float64_rem() is now called f64_rem() and defined in f64_rem.c. It
doesn't look like there's anything from v3 that we could backport to
the kernel's v2 to avoid this.

Otherwise perhaps we just disable OABI_COMPAT for clang. Quite a few
defconfigs explicitly enable FPE_NWFPE though. Are there really a lot
of OABI binaries still in use?

There's also the hidden llvm flag:
`-mllvm -replexitval=never` that seems to work here, though FWICT it's
disabling 3 such loop elisions (I think all three statements in that
do while). That's probably the best way forward here...

https://reviews.llvm.org/D9800 made the decision to do such a
transformation when a loop can be fully elided ("deleted").

>
> This has historically been strongly NAK'd, and I don't think that position
> has changed in the meantime. A variable-argument 64-bit division is
> really expensive, especially on 32-bit machines that lack a native
> 32-bit division instruction, and we don't want developers to accidentally
> insert one in their driver code.
>
> Explicitly calling one of the division helpers in linux/math64.h is the
> established way for driver writers to declare that a particular division
> cannot be turned into a cheaper operation and is never run in a
> performance critical code path. The compiler of course cannot know
> about either of those.
>
> Arnd



--
Thanks,
~Nick Desaulniers