+ Ard, who wrote this code.
On Fri, Nov 06, 2020 at 07:14:36AM +0200, Adrian Ratiu wrote:Due to a Clang bug [1] neon autoloop vectorization does not happen or happens badly with no gains and considering previous GCC experiences which generated unoptimized code which was worse than the default asm implementation, it is safer to default clang builds to the known good generic implementation. The kernel currently supports a minimum Clang version of v10.0.1, see commit 1f7a44f63e6c ("compiler-clang: add build check for clang 10.0.1"). When the bug gets eventually fixed, this commit could be reverted or, if the minimum clang version bump takes a long time, a warning could be added for users to upgrade their compilers like was done for GCC. [1] https://bugs.llvm.org/show_bug.cgi?id=40976 Signed-off-by: Adrian Ratiu <adrian.ratiu@xxxxxxxxxxxxx>
Thank you for the patch! We are also tracking this here:
https://github.com/ClangBuiltLinux/linux/issues/496
It was on my TODO to revist getting the warning eliminated, which likely would have involved a patch like this as well.
I am curious if it is worth revisting or dusting off Arnd's patch in the LLVM bug tracker first. I have not tried it personally. If that is not a worthwhile option, I am fine with this for now. It would be nice to try and get a fix pinned down on the LLVM side at some point but alas, finite amount of resources and people :(
Should no other options come to fruition from further discussions, you can carry my tag forward:
Acked-by: Nathan Chancellor <natechancellor@xxxxxxxxx>
Hopefully others can comment soon.
---
arch/arm/include/asm/xor.h | 3 ++-
arch/arm/lib/Makefile | 3 +++
arch/arm/lib/xor-neon.c | 4 ++++
3 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/arm/include/asm/xor.h b/arch/arm/include/asm/xor.h
index aefddec79286..49937dafaa71 100644
--- a/arch/arm/include/asm/xor.h
+++ b/arch/arm/include/asm/xor.h
@@ -141,7 +141,8 @@ static struct xor_block_template xor_block_arm4regs = {
NEON_TEMPLATES; \
} while (0)
-#ifdef CONFIG_KERNEL_MODE_NEON
+/* disabled on clang/arm due to https://bugs.llvm.org/show_bug.cgi?id=40976 */
+#if defined(CONFIG_KERNEL_MODE_NEON) && !defined(CONFIG_CC_IS_CLANG)
extern struct xor_block_template const xor_block_neon_inner;
diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
index 6d2ba454f25b..53f9e7dd9714 100644
--- a/arch/arm/lib/Makefile
+++ b/arch/arm/lib/Makefile
@@ -43,8 +43,11 @@ endif
$(obj)/csumpartialcopy.o: $(obj)/csumpartialcopygeneric.S
$(obj)/csumpartialcopyuser.o: $(obj)/csumpartialcopygeneric.S
+# disabled on clang/arm due to https://bugs.llvm.org/show_bug.cgi?id=40976
+ifndef CONFIG_CC_IS_CLANG
ifeq ($(CONFIG_KERNEL_MODE_NEON),y)
NEON_FLAGS := -march=armv7-a -mfloat-abi=softfp -mfpu=neon
CFLAGS_xor-neon.o += $(NEON_FLAGS)
obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o
endif
+endif
diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
index e1e76186ec23..84c91c48dfa2 100644
--- a/arch/arm/lib/xor-neon.c
+++ b/arch/arm/lib/xor-neon.c
@@ -18,6 +18,10 @@ MODULE_LICENSE("GPL");
* Pull in the reference implementations while instructing GCC (through
* -ftree-vectorize) to attempt to exploit implicit parallelism and emit
* NEON instructions.
+
+ * On Clang the loop vectorizer is enabled by default, but due to a bug
+ * (https://bugs.llvm.org/show_bug.cgi?id=40976) vectorization is broke
+ * so xor-neon is disabled in favor of the default reg implementations.
*/
#ifdef CONFIG_CC_IS_GCC
#pragma GCC optimize "tree-vectorize"
--
2.29.0