Re: static_branch/jump_label vs branch merging
From: David Malcolm
Date: Fri Apr 09 2021 - 09:48:44 EST
On Fri, 2021-04-09 at 15:13 +0200, Peter Zijlstra wrote:
> On Fri, Apr 09, 2021 at 03:01:50PM +0200, Peter Zijlstra wrote:
> > On Fri, Apr 09, 2021 at 02:03:46PM +0200, Peter Zijlstra wrote:
> > > On Fri, Apr 09, 2021 at 07:55:42AM -0400, David Malcolm wrote:
> >
> > > > Sorry if this is a dumb question, but does the function
> > > > attribute:
> > > > __attribute__ ((pure))
> > > > help here? It's meant to allow multiple calls to a predicate
> > > > to be
> > > > merged - though I'd be nervous of using it here, the predicate
> > > > isn't
> > > > 100% pure, since AIUI the whole point of what you've built is
> > > > for
> > > > predicates that very rarely change - but can change
> > > > occasionally.
> > >
> > > I actually tried that, but it doesn't seem to work. Given the
> > > function
> > > arguments are all compile time constants it should DTRT AFAICT,
> > > but
> > > alas.
> >
> > FWIW, I tried the below patch and GCC-10.2.1 on current tip/master.
>
> I also just tried __attribute__((__const__)), which is stronger still
> than __pure__ and that's also not working :/
>
> I then also tried to replace the __buildin_types_compatible_p() magic
> in
> static_branch_unlikely() with _Generic(), but still no joy.
>
[..snip...]
You tried __pure on arch_static_branch; did you try it on
static_branch_unlikely?
With the caveat that my knowledge of GCC's middle-end is mostly about
implementing warnings, rather than optimization, I did some
experimentation, with gcc trunk on x86_64 FWIW.
Given:
int __attribute__((pure)) foo(void);
int t(void)
{
int a;
if (foo())
a++;
if (foo())
a++;
if (foo())
a++;
return a;
}
At -O1 and above this is optimized to a single call to foo, returning 0
or 3 accordingly.
-fdump-tree-all shows that it's the "fre1" pass that eliminates the
subsequent calls to foo, replacing them with reuses of the result of
the first call.
This is in gcc/tree-ssa-sccvn.c, a value-numbering pass.
I think you want to somehow "teach" the compiler that:
static_branch_unlikely(&sched_schedstats)
is "pure-ish", that for some portion of the surrounding code that you
want the result to be treated as pure - though I suspect compiler
maintainers with more experience than me are thinking "but which
portion? what is it safe to assume, and what will users be annoyed
about if we optimize away? what if t itself is inlined somewhere?" and
similar concerns.
Or maybe the asm stmt itself could somehow be marked as pure??? (with
similar concerns about semantics as above)
Hope this is constructive
Dave