Re: [PATCH 2/2] arm: lib: xor-neon: disable clang vectorization

From: Nathan Chancellor
Date: Fri Nov 06 2020 - 05:14:24 EST


+ Ard, who wrote this code.

On Fri, Nov 06, 2020 at 07:14:36AM +0200, Adrian Ratiu wrote:
> Due to a Clang bug [1] neon autoloop vectorization does not happen or
> happens badly with no gains and considering previous GCC experiences
> which generated unoptimized code which was worse than the default asm
> implementation, it is safer to default clang builds to the known good
> generic implementation.
>
> The kernel currently supports a minimum Clang version of v10.0.1, see
> commit 1f7a44f63e6c ("compiler-clang: add build check for clang 10.0.1").
>
> When the bug gets eventually fixed, this commit could be reverted or,
> if the minimum clang version bump takes a long time, a warning could
> be added for users to upgrade their compilers like was done for GCC.
>
> [1] https://bugs.llvm.org/show_bug.cgi?id=40976
>
> Signed-off-by: Adrian Ratiu <adrian.ratiu@xxxxxxxxxxxxx>

Thank you for the patch! We are also tracking this here:

https://github.com/ClangBuiltLinux/linux/issues/496

It was on my TODO to revist getting the warning eliminated, which likely
would have involved a patch like this as well.

I am curious if it is worth revisting or dusting off Arnd's patch in the
LLVM bug tracker first. I have not tried it personally. If that is not a
worthwhile option, I am fine with this for now. It would be nice to try
and get a fix pinned down on the LLVM side at some point but alas,
finite amount of resources and people :(

Should no other options come to fruition from further discussions, you
can carry my tag forward:

Acked-by: Nathan Chancellor <natechancellor@xxxxxxxxx>

Hopefully others can comment soon.

> ---
> arch/arm/include/asm/xor.h | 3 ++-
> arch/arm/lib/Makefile | 3 +++
> arch/arm/lib/xor-neon.c | 4 ++++
> 3 files changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm/include/asm/xor.h b/arch/arm/include/asm/xor.h
> index aefddec79286..49937dafaa71 100644
> --- a/arch/arm/include/asm/xor.h
> +++ b/arch/arm/include/asm/xor.h
> @@ -141,7 +141,8 @@ static struct xor_block_template xor_block_arm4regs = {
> NEON_TEMPLATES; \
> } while (0)
>
> -#ifdef CONFIG_KERNEL_MODE_NEON
> +/* disabled on clang/arm due to https://bugs.llvm.org/show_bug.cgi?id=40976 */
> +#if defined(CONFIG_KERNEL_MODE_NEON) && !defined(CONFIG_CC_IS_CLANG)
>
> extern struct xor_block_template const xor_block_neon_inner;
>
> diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
> index 6d2ba454f25b..53f9e7dd9714 100644
> --- a/arch/arm/lib/Makefile
> +++ b/arch/arm/lib/Makefile
> @@ -43,8 +43,11 @@ endif
> $(obj)/csumpartialcopy.o: $(obj)/csumpartialcopygeneric.S
> $(obj)/csumpartialcopyuser.o: $(obj)/csumpartialcopygeneric.S
>
> +# disabled on clang/arm due to https://bugs.llvm.org/show_bug.cgi?id=40976
> +ifndef CONFIG_CC_IS_CLANG
> ifeq ($(CONFIG_KERNEL_MODE_NEON),y)
> NEON_FLAGS := -march=armv7-a -mfloat-abi=softfp -mfpu=neon
> CFLAGS_xor-neon.o += $(NEON_FLAGS)
> obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o
> endif
> +endif
> diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
> index e1e76186ec23..84c91c48dfa2 100644
> --- a/arch/arm/lib/xor-neon.c
> +++ b/arch/arm/lib/xor-neon.c
> @@ -18,6 +18,10 @@ MODULE_LICENSE("GPL");
> * Pull in the reference implementations while instructing GCC (through
> * -ftree-vectorize) to attempt to exploit implicit parallelism and emit
> * NEON instructions.
> +
> + * On Clang the loop vectorizer is enabled by default, but due to a bug
> + * (https://bugs.llvm.org/show_bug.cgi?id=40976) vectorization is broke
> + * so xor-neon is disabled in favor of the default reg implementations.
> */
> #ifdef CONFIG_CC_IS_GCC
> #pragma GCC optimize "tree-vectorize"
> --
> 2.29.0
>