Re: [PATCH] [v2] coresight: etm4x: avoid build failure with unrolled loops

From: Suzuki K Poulose
Date: Thu Apr 29 2021 - 18:50:21 EST


On 29/04/2021 18:50, Arnd Bergmann wrote:
On Thu, Apr 29, 2021 at 7:37 PM Suzuki K Poulose <suzuki.poulose@xxxxxxx> wrote:
On 29/04/2021 15:57, Arnd Bergmann wrote:
From: Arnd Bergmann <arnd@xxxxxxxx>

clang-12 fails to build the etm4x driver with -fsanitize=array-bounds,
where it decides to unroll certain loops in a way that result in a
C variable getting put into an inline assembly

<instantiation>:1:7: error: expected constant expression in '.inst' directive
.inst (0xd5200000|((((2) << 19) | ((1) << 16) | (((((((((((0x160 + (i * 4))))) >> 2))) >> 7) & 0x7)) << 12) | ((((((((((0x160 + (i * 4))))) >> 2))) & 0xf)) << 8) | (((((((((((0x160 + (i * 4))))) >> 2))) >> 4) & 0x7)) << 5)))|(.L__reg_num_x8))
^
drivers/hwtracing/coresight/coresight-etm4x-core.c:702:4: note: while in macro instantiation
etm4x_relaxed_read32(csa, TRCCNTVRn(i));
^
drivers/hwtracing/coresight/coresight-etm4x.h:403:4: note: expanded from macro 'etm4x_relaxed_read32'
read_etm4x_sysreg_offset((offset), false)))
^
drivers/hwtracing/coresight/coresight-etm4x.h:383:12: note: expanded from macro 'read_etm4x_sysreg_offset'
__val = read_etm4x_sysreg_const_offset((offset)); \
^
drivers/hwtracing/coresight/coresight-etm4x.h:149:2: note: expanded from macro 'read_etm4x_sysreg_const_offset'
READ_ETM4x_REG(ETM4x_OFFSET_TO_REG(offset))
^
drivers/hwtracing/coresight/coresight-etm4x.h:144:2: note: expanded from macro 'READ_ETM4x_REG'
read_sysreg_s(ETM4x_REG_NUM_TO_SYSREG((reg)))
^
arch/arm64/include/asm/sysreg.h:1108:15: note: expanded from macro 'read_sysreg_s'
asm volatile(__mrs_s("%0", r) : "=r" (__val)); \
^
arch/arm64/include/asm/sysreg.h:1074:2: note: expanded from macro '__mrs_s'
" mrs_s " v ", " __stringify(r) "\n" \
^

This only happened in a few loops in which the array bounds sanitizer
added a special case for an array overflow that clang determined to be
possible, but any compiler is free to unroll any of the loops in the
same way that breaks the sysreg macros.

Introduce helper functions that perform a sysreg access with a
non-constant register number and use them in each call that passes
a loop counter.

You don't need to add this special helper. We have the exact
infrastructure already. So these could simply be replaced with:

csdev_access_xxx(csa, ...)

see :

include/linux/coresight.h

Ah, nice!

Do you mean replacing only the ones that use a nonconstant
offset, or all of them? I guess changing all would avoid some

Only the CLANG "nonconstant" please. Going through an indirect
function call and large "switch..case" for every single register
access is painful. Which is why I decided to add those ugly macros.

So, please leave the rest as they are.

Suzuki


really ugly magic macros, but the indirect function call and the
switch() adds a few cycles of overhead every time and the code
looks like it is micro-optimized for fast register access here.

Arnd