Re: [PATCH] riscv: Fix build with CONFIG_CC_OPTIMIZE_FOR_SIZE=y

From: Atish Patra
Date: Fri Sep 23 2022 - 14:01:48 EST


On Fri, Sep 23, 2022 at 12:18 AM Heiko Stuebner <heiko@xxxxxxxxx> wrote:
>
> Hi,
>
> Am Donnerstag, 22. September 2022, 17:52:46 CEST schrieb Jessica Clarke:
> > On 22 Sept 2022, at 16:45, Heiko Stuebner <heiko@xxxxxxxxx> wrote:
> > >
> > > Am Donnerstag, 22. September 2022, 08:09:58 CEST schrieb Samuel Holland:
> > >> commit 8eb060e10185 ("arch/riscv: add Zihintpause support") broke
> > >> building with CONFIG_CC_OPTIMIZE_FOR_SIZE enabled (gcc 11.1.0):
> > >>
> > >> CC arch/riscv/kernel/vdso/vgettimeofday.o
> > >> In file included from <command-line>:
> > >> ./arch/riscv/include/asm/jump_label.h: In function 'cpu_relax':
> > >> ././include/linux/compiler_types.h:285:33: warning: 'asm' operand 0 probably does not match constraints
> > >> 285 | #define asm_volatile_goto(x...) asm goto(x)
> > >> | ^~~
> > >> ./arch/riscv/include/asm/jump_label.h:41:9: note: in expansion of macro 'asm_volatile_goto'
> > >> 41 | asm_volatile_goto(
> > >> | ^~~~~~~~~~~~~~~~~
> > >> ././include/linux/compiler_types.h:285:33: error: impossible constraint in 'asm'
> > >> 285 | #define asm_volatile_goto(x...) asm goto(x)
> > >> | ^~~
> > >> ./arch/riscv/include/asm/jump_label.h:41:9: note: in expansion of macro 'asm_volatile_goto'
> > >> 41 | asm_volatile_goto(
> > >> | ^~~~~~~~~~~~~~~~~
> > >> make[1]: *** [scripts/Makefile.build:249: arch/riscv/kernel/vdso/vgettimeofday.o] Error 1
> > >> make: *** [arch/riscv/Makefile:128: vdso_prepare] Error 2
> > >>
> > >> Having a static branch in cpu_relax() is problematic because that
> > >> function is widely inlined, including in some quite complex functions
> > >> like in the VDSO. A quick measurement shows this static branch is
> > >> responsible by itself for around 40% of the jump table.
> > >>
> > >> Drop the static branch, which ends up being the same number of
> > >> instructions anyway. If Zihintpause is supported, we trade the nop from
> > >> the static branch for a div. If Zihintpause is unsupported, we trade the
> > >> jump from the static branch for (what gets interpreted as) a nop.
> > >>
> > >> Fixes: 8eb060e10185 ("arch/riscv: add Zihintpause support")
> > >> Signed-off-by: Samuel Holland <samuel@xxxxxxxxxxxx>
> > >> ---
> > >>
> > >> arch/riscv/include/asm/hwcap.h | 3 ---
> > >> arch/riscv/include/asm/vdso/processor.h | 25 ++++++++++---------------
> > >> 2 files changed, 10 insertions(+), 18 deletions(-)
> > >>
> > >> diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
> > >> index 6f59ec64175e..b21d46e68386 100644
> > >> --- a/arch/riscv/include/asm/hwcap.h
> > >> +++ b/arch/riscv/include/asm/hwcap.h
> > >> @@ -68,7 +68,6 @@ enum riscv_isa_ext_id {
> > >> */
> > >> enum riscv_isa_ext_key {
> > >> RISCV_ISA_EXT_KEY_FPU, /* For 'F' and 'D' */
> > >> - RISCV_ISA_EXT_KEY_ZIHINTPAUSE,
> > >> RISCV_ISA_EXT_KEY_MAX,
> > >> };
> > >>
> > >> @@ -88,8 +87,6 @@ static __always_inline int riscv_isa_ext2key(int num)
> > >> return RISCV_ISA_EXT_KEY_FPU;
> > >> case RISCV_ISA_EXT_d:
> > >> return RISCV_ISA_EXT_KEY_FPU;
> > >> - case RISCV_ISA_EXT_ZIHINTPAUSE:
> > >> - return RISCV_ISA_EXT_KEY_ZIHINTPAUSE;
> > >> default:
> > >> return -EINVAL;
> > >> }
> > >> diff --git a/arch/riscv/include/asm/vdso/processor.h b/arch/riscv/include/asm/vdso/processor.h
> > >> index 1e4f8b4aef79..789bdb8211a2 100644
> > >> --- a/arch/riscv/include/asm/vdso/processor.h
> > >> +++ b/arch/riscv/include/asm/vdso/processor.h
> > >> @@ -4,30 +4,25 @@
> > >>
> > >> #ifndef __ASSEMBLY__
> > >>
> > >> -#include <linux/jump_label.h>
> > >> #include <asm/barrier.h>
> > >> -#include <asm/hwcap.h>
> > >>
> > >> static inline void cpu_relax(void)
> > >> {
> > >> - if (!static_branch_likely(&riscv_isa_ext_keys[RISCV_ISA_EXT_KEY_ZIHINTPAUSE])) {
> > >> #ifdef __riscv_muldiv
> > >> - int dummy;
> > >> - /* In lieu of a halt instruction, induce a long-latency stall. */
> > >> - __asm__ __volatile__ ("div %0, %0, zero" : "=r" (dummy));
> > >> + int dummy;
> > >> + /* In lieu of a halt instruction, induce a long-latency stall. */
> > >> + __asm__ __volatile__ ("div %0, %0, zero" : "=r" (dummy));
> > >> #endif
> > >> - } else {
> > >> - /*
> > >> - * Reduce instruction retirement.
> > >> - * This assumes the PC changes.
> > >> - */
> > >> + /*
> > >> + * Reduce instruction retirement.
> > >> + * This assumes the PC changes.
> > >> + */
> > >> #ifdef __riscv_zihintpause
> > >> - __asm__ __volatile__ ("pause");
> > >> + __asm__ __volatile__ ("pause");
> > >> #else
> > >> - /* Encoding of the pause instruction */
> > >> - __asm__ __volatile__ (".4byte 0x100000F");
> > >> + /* Encoding of the pause instruction */
> > >> + __asm__ __volatile__ (".4byte 0x100000F");
> > >> #endif
> > >
> > > hmm, though before this part of the code was only ever accessed
> > > when the zhintpause extension was really available on the running
> > > machine while now the pause instruction is called every time.
> > >
> > > So I'm just wondering, can't this run into some "illegal instruction"
> > > thingy on machines not supporting the extension?
> >
> > No. The encoding for pause was deliberately chosen to be one of the
> > “useless” encodings of fence, with the hope that existing
> > microarchitectures might take a while to execute it and thus it would
> > still function as a slow-running instruction. It’s somewhat
> > questionable whether the div is even needed, the worst that happens is
> > cpu_relax isn’t very relaxed and you spin a bit faster. Any
> > implementations where that’s true probably also don’t have fancy
> > clock/power management anyway, and div isn’t going to be a low-power
> > operation so the only real effect is likely hammering on contended
> > atomics a bit more, and who cares about that on the low core count
> > systems we have today.
>
> thanks a lot for that explanation, which made things a lot clearer.
>
> So as you said, dropping the div part might make the function even smaller,
> though somehow part of me would want to add some sort of comment to
> the function for when the next developer stumbles over the unconditional
> use of pause :-) .
>

I agree. If that's what microarch will do, we can drop div altogether.
Though microarch may be treated as nop even if it is undesirable.
IIRC, the div was introduced for the rocket chip which would induce a
long latency stall with div instruction (zero as operands).

Does any other core or newer rocket chip actually induce a latency
stall with div instruction ?
If not, it is equivalent to NOP as well. We can definitely remove the div.
The only cores affected will be the older rocket core.

Tagging some folks to understand what their core does.

@Paul Walmsley @Guo Ren @Conor Dooley ?

(Please add anybody who may have an insight to execution flow on
existing Linux capable cores)

>
> Heiko
>
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@xxxxxxxxxxxxxxxxxxx
> http://lists.infradead.org/mailman/listinfo/linux-riscv



--
Regards,
Atish