CONFIG_ARCH_SUPPORTS_INT128: Why not mips, s390, powerpc, and alpha?
From: George Spelvin (lkml@xxxxxxx)
Date: Fri Mar 29 2019 - 09:07:27 EST
(Cross-posted in case there are generic issues; please trim if
discussion wanders into single-architecture details.)
I was working on some scaling code that can benefit from 64x64->128-bit
multiplies. GCC supports an __int128 type on processors with hardware
support (including z/Arch and MIPS64), but the support was broken on
early compilers, so it's gated behind CONFIG_ARCH_SUPPORTS_INT128.
Currently, of the ten 64-bit architectures Linux supports, that's
only enabled on x86, ARM, and RISC-V.
SPARC and HP-PA don't have support.
But that leaves Alpha, Mips, PowerPC, and S/390x.
Current mips64, powerpc64, and s390x gcc seems to generate sensible code
for mul_u64_u64_shr() in <linux/math64.h> if I cross-compile them.
I don't have easy access to an Alpha cross-compiler to test, but
as it has UMULH, I suspect it would work, too.
Is there a reason it hasn't been enabled on these platforms?
There might be a MIPS64r6 issue, since r6 changed from DMULTU
writing the lo and hi registers to DMULU/DMUHU, and gcc 8.3, at
least, doesn't know how to generate inline code for the latter.
(Note that users *also* check __INT128__, which is defined if GCC
claims to support __int128, so you don't have to worry about 32-bit
compiles or ancient compilers. It only has to be conditional on
FWIW, the code I'm working on has this inner loop:
(https://arxiv.org/abs/1805.10941 for details)
u64 get_random_max64(u64 range, u64 lim)
unsigned __int128 prod;
prod = (unsigned __int128)get_random_u64() * range;
} while (unlikely((u64)prod < lim));
return prod >> 64;
Which turns into these inner loops:
I like that the MIPS code leaves the high half of the product in
the hi register until it tests the low half; I wish PowerPC would
similarly move the mulhdu *after* the loop, like the following
hypothetical MIPS R6 code:
dmulu $3, $2, $17
sltu $3, $3, $16
bnezc $3, .L7
dmuhu $2, $2, $17
Or this handwritten Alpha code:
bsr $26, get_random_u64
mulq $0, $9, $1 # $9 is range
cmpult $1, $10, $1 # $10 is lim
bne $1, 1b
umulh $0, $9, $0