Re: static_branch/jump_label vs branch merging

From: Peter Zijlstra
Date: Fri Apr 09 2021 - 14:40:19 EST


On Fri, Apr 09, 2021 at 09:48:33AM -0400, David Malcolm wrote:
> You tried __pure on arch_static_branch; did you try it on
> static_branch_unlikely?

static_branch_unlikely() is a CPP macro that expands to a statement
expression, or as with the later patch, a _Generic(). I'm not sure how
to apply the attribute to either of them since it is a function
attribute.

I was hoping the attribute would percolate through, so to speak.

> With the caveat that my knowledge of GCC's middle-end is mostly about
> implementing warnings, rather than optimization, I did some
> experimentation, with gcc trunk on x86_64 FWIW.
>
> Given:
>
> int __attribute__((pure)) foo(void);
>
> int t(void)
> {
> int a;
> if (foo())
> a++;
> if (foo())
> a++;
> if (foo())
> a++;
> return a;
> }
>
> At -O1 and above this is optimized to a single call to foo, returning 0
> or 3 accordingly.
>
> -fdump-tree-all shows that it's the "fre1" pass that eliminates the
> subsequent calls to foo, replacing them with reuses of the result of
> the first call.
>
> This is in gcc/tree-ssa-sccvn.c, a value-numbering pass.
>
> I think you want to somehow "teach" the compiler that:
> static_branch_unlikely(&sched_schedstats)
> is "pure-ish", that for some portion of the surrounding code that you
> want the result to be treated as pure - though I suspect compiler
> maintainers with more experience than me are thinking "but which
> portion? what is it safe to assume, and what will users be annoyed
> about if we optimize away? what if t itself is inlined somewhere?" and
> similar concerns.

Right, pure or even const. As to the scope, as wide as possible. It
literally is a global constant, the value returned is the same
everywhere.

All we need GCC to do for the static_branch construct is to emit both
branches; that is, it must not treat the result as a constant and elide
the other branches. But it can consider consecutive calls (as far and
wide as it wants) to return the same value.

> Or maybe the asm stmt itself could somehow be marked as pure??? (with
> similar concerns about semantics as above)

Yeah, not sure, someone with more clue will have to inform us what, if
anything more than marking it either pure or const is required. Perhaps
that attribute is sufficient and the compiler just isn't optimizing for
an unrelated reason.

Regards,

Peter