Re: [PATCH 10/22] bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()

From: Josh Poimboeuf
Date: Tue Jul 16 2019 - 19:03:02 EST


On Tue, Jul 16, 2019 at 11:15:54AM -0700, Nick Desaulniers wrote:
> On Sun, Jul 14, 2019 at 5:37 PM Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
> >
> > On x86-64, with CONFIG_RETPOLINE=n, GCC's "global common subexpression
> > elimination" optimization results in ___bpf_prog_run()'s jumptable code
> > changing from this:
> >
> > select_insn:
> > jmp *jumptable(, %rax, 8)
> > ...
> > ALU64_ADD_X:
> > ...
> > jmp *jumptable(, %rax, 8)
> > ALU_ADD_X:
> > ...
> > jmp *jumptable(, %rax, 8)
> >
> > to this:
> >
> > select_insn:
> > mov jumptable, %r12
> > jmp *(%r12, %rax, 8)
> > ...
> > ALU64_ADD_X:
> > ...
> > jmp *(%r12, %rax, 8)
> > ALU_ADD_X:
> > ...
> > jmp *(%r12, %rax, 8)
> >
> > The jumptable address is placed in a register once, at the beginning of
> > the function. The function execution can then go through multiple
> > indirect jumps which rely on that same register value. This has a few
> > issues:
> >
> > 1) Objtool isn't smart enough to be able to track such a register value
> > across multiple recursive indirect jumps through the jump table.
> >
> > 2) With CONFIG_RETPOLINE enabled, this optimization actually results in
> > a small slowdown. I measured a ~4.7% slowdown in the test_bpf
> > "tcpdump port 22" selftest.
> >
> > This slowdown is actually predicted by the GCC manual:
> >
> > Note: When compiling a program using computed gotos, a GCC
> > extension, you may get better run-time performance if you
> > disable the global common subexpression elimination pass by
> > adding -fno-gcse to the command line.
> >
> > So just disable the optimization for this function.
> >
> > Fixes: e55a73251da3 ("bpf: Fix ORC unwinding in non-JIT BPF code")
> > Reported-by: Randy Dunlap <rdunlap@xxxxxxxxxxxxx>
> > Signed-off-by: Josh Poimboeuf <jpoimboe@xxxxxxxxxx>
> > Acked-by: Alexei Starovoitov <ast@xxxxxxxxxx>
> > ---
> > Cc: Alexei Starovoitov <ast@xxxxxxxxxx>
> > Cc: Daniel Borkmann <daniel@xxxxxxxxxxxxx>
> > ---
> > include/linux/compiler-gcc.h | 2 ++
> > include/linux/compiler_types.h | 4 ++++
> > kernel/bpf/core.c | 2 +-
> > 3 files changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> > index e8579412ad21..d7ee4c6bad48 100644
> > --- a/include/linux/compiler-gcc.h
> > +++ b/include/linux/compiler-gcc.h
> > @@ -170,3 +170,5 @@
> > #else
> > #define __diag_GCC_8(s)
> > #endif
> > +
> > +#define __no_fgcse __attribute__((optimize("-fno-gcse")))
>
> + Miguel, maintainer of compiler_attributes.h
> I wonder if the optimize attributes can be feature detected?
> Is -fno-gcse supported all the way back to GCC 4.6?

Yeah, from snooping in the GCC tree it looks like it's been around
for 18+ years.

--
Josh