Re: [PATCH v2 1/2] bpf: don't rely on GCC __attribute__((optimize)) to disable GCSE

From: Ard Biesheuvel
Date: Wed Oct 28 2020 - 19:24:07 EST


On Wed, 28 Oct 2020 at 23:59, Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Wed, Oct 28, 2020 at 11:15:04PM +0100, Ard Biesheuvel wrote:
> > On Wed, 28 Oct 2020 at 22:39, Alexei Starovoitov
> > <alexei.starovoitov@xxxxxxxxx> wrote:
> > >
> > > On Wed, Oct 28, 2020 at 06:15:05PM +0100, Ard Biesheuvel wrote:
> > > > Commit 3193c0836 ("bpf: Disable GCC -fgcse optimization for
> > > > ___bpf_prog_run()") introduced a __no_fgcse macro that expands to a
> > > > function scope __attribute__((optimize("-fno-gcse"))), to disable a
> > > > GCC specific optimization that was causing trouble on x86 builds, and
> > > > was not expected to have any positive effect in the first place.
> > > >
> > > > However, as the GCC manual documents, __attribute__((optimize))
> > > > is not for production use, and results in all other optimization
> > > > options to be forgotten for the function in question. This can
> > > > cause all kinds of trouble, but in one particular reported case,
> > > > it causes -fno-asynchronous-unwind-tables to be disregarded,
> > > > resulting in .eh_frame info to be emitted for the function.
> > > >
> > > > This reverts commit 3193c0836, and instead, it disables the -fgcse
> > > > optimization for the entire source file, but only when building for
> > > > X86 using GCC with CONFIG_BPF_JIT_ALWAYS_ON disabled. Note that the
> > > > original commit states that CONFIG_RETPOLINE=n triggers the issue,
> > > > whereas CONFIG_RETPOLINE=y performs better without the optimization,
> > > > so it is kept disabled in both cases.
> > > >
> > > > Fixes: 3193c0836 ("bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()")
> > > > Link: https://lore.kernel.org/lkml/CAMuHMdUg0WJHEcq6to0-eODpXPOywLot6UD2=GFHpzoj_hCoBQ@xxxxxxxxxxxxxx/
> > > > Signed-off-by: Ard Biesheuvel <ardb@xxxxxxxxxx>
> > > > ---
> > > > include/linux/compiler-gcc.h | 2 --
> > > > include/linux/compiler_types.h | 4 ----
> > > > kernel/bpf/Makefile | 6 +++++-
> > > > kernel/bpf/core.c | 2 +-
> > > > 4 files changed, 6 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> > > > index d1e3c6896b71..5deb37024574 100644
> > > > --- a/include/linux/compiler-gcc.h
> > > > +++ b/include/linux/compiler-gcc.h
> > > > @@ -175,5 +175,3 @@
> > > > #else
> > > > #define __diag_GCC_8(s)
> > > > #endif
> > > > -
> > > > -#define __no_fgcse __attribute__((optimize("-fno-gcse")))
> > >
> > > See my reply in the other thread.
> > > I prefer
> > > -#define __no_fgcse __attribute__((optimize("-fno-gcse")))
> > > +#define __no_fgcse __attribute__((optimize("-fno-gcse,-fno-omit-frame-pointer")))
> > >
> > > Potentially with -fno-asynchronous-unwind-tables.
> > >
> >
> > So how would that work? arm64 has the following:
> >
> > KBUILD_CFLAGS += -fno-asynchronous-unwind-tables -fno-unwind-tables
> >
> > ifeq ($(CONFIG_SHADOW_CALL_STACK), y)
> > KBUILD_CFLAGS += -ffixed-x18
> > endif
> >
> > and it adds -fpatchable-function-entry=2 for compilers that support
> > it, but only when CONFIG_FTRACE is enabled.
>
> I think you're assuming that GCC drops all flags when it sees __attribute__((optimize)).
> That's not the case.
>

So which flags does it drop, and which doesn't it drop? Is that
documented somewhere? Is that the same for all versions of GCC?

> > Also, as Nick pointed out, -fno-gcse does not work on Clang.
>
> yes and what's the point?
> #define __no_fgcse is GCC only. clang doesn't need this workaround.
>

Ah ok, that's at least something.

> > Every architecture will have a different set of requirements here. And
> > there is no way of knowing which -f options are disregarded when you
> > use the function attribute.
> >
> > So how on earth are you going to #define __no-fgcse correctly for
> > every configuration imaginable?
> >
> > > __attribute__((optimize("")) is not as broken as you're claiming to be.
> > > It has quirky gcc internal logic, but it's still widely used
> > > in many software projects.
> >
> > So it's fine because it is only a little bit broken? I'm sorry, but
> > that makes no sense whatsoever.
> >
> > If you insist on sticking with this broken construct, can you please
> > make it GCC/x86-only at least?
>
> I'm totally fine with making
> #define __no_fgcse __attribute__((optimize("-fno-gcse,-fno-omit-frame-pointer")))
> to be gcc+x86 only.
> I'd like to get rid of it, but objtool is not smart enough to understand
> generated asm without it.

I'll defer to the x86 folks to make the final call here, but I would
be perfectly happy doing

index d1e3c6896b71..68ddb91fbcc6 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -176,4 +176,6 @@
#define __diag_GCC_8(s)
#endif

+#ifdef CONFIG_X86
#define __no_fgcse __attribute__((optimize("-fno-gcse")))
+#endif

and end the conversation here, because I honestly cannot wrap my head
around the fact that you are willing to work around an x86 specific
objtool shortcoming by arbitrarily disabling some GCC optimization for
all architectures, using a construct that may or may not affect other
compiler settings in unpredictable ways, where the compiler is being
used to compile a BPF language runtime for executing BPF programs
inside the kernel.

What on earth could go wrong?