Re: [PATCH] x86: Turn off GCC branch probability heuristics
From: Ingo Molnar
Date: Sun Apr 12 2015 - 03:41:46 EST
* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Sat, Apr 11, 2015 at 11:57 AM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> >
> > I thinks its just the no-guess one:
> >
> > text data dec patch reduction
> > 7563475 1781048 10302987
> > 7192973 1780024 9931461 no-guess -4.8%
> > 7354819 1781048 958464 align-1 -2.7%
> > 7192973 1780024 9931461 no-guess + align-1 -4.8%
>
> Yeah, a 5% code expansion is a big deal. Sadly, it looks like
> 'no-guess' also disables our explicit likely/unlikely handling.
>
> Damn. If it actually honored likely/unlikely, then we should just do
> it - and manually fix up any places where we really care.
>
> But the fact that it apparently entirely disables not just the
> guesses, but our *explicit* likely/unlikely, means that we can't fix
> up the mistakes.
>
> And in many of the hot codepaths that likely/unlikely really does
> matter. Some of our hottest paths have known "this basically never
> happens" situations that we do *not* want to break up our L1 I$ over.
> There's a number of functions that have been optimized to really
> generate good code, and "-fno-guess-branch-probability" disables those
> manual optimizations.
>
> So we'd have no way to fix it for the cases that matter.
>
> Sad.
>
> It might be worth bringing this up with some gcc people. I added Jakub
> to the cc. Any other gcc people suggestions?
(Skip to the 'More numbers' section below to see more measurements.)
So what would be nice to have is if GCC had an optimization option to
disable all branch probability heuristics (like
-fno-guess-branch-probability), except explicit __builtin_expect()
hints (which -fno-guess-branch-probability unfortunately disables).
So what would be useful to have is a -fno-branch-probability-heuristics
GCC option or so, or a "--param branch-probability-heuristics=0"
option.
I found one related option:
--param predictable-branch-outcome=N
predictable-branch-outcome
When branch is predicted to be taken with probability
lower than this threshold (in percent), then it is
considered well predictable. The default is 10.
I tried the values 1 and 0, on a -O2 kernel (i.e. guess-branch-probability
was enabled), in the hope that it maybe turns off all non-__builtin_expect()
branch heuristics, but it only had very minor size impact in the 0.001% range.
More numbers:
-------------
I also measured the effect of our __builtin_expect()
(likely()/unlikely()) branch annotations.
As an experiment I mapped both likely()/unlikely() to the four
possible __builtin_expect() probability settings:
likely(): __builtin_expect(, 1) unlikely(): __builtin_expect(, 0)
likely(): __builtin_expect(, 0) unlikely(): __builtin_expect(, 1)
likely(): __builtin_expect(, 0) unlikely(): __builtin_expect(, 0)
likely(): __builtin_expect(, 1) unlikely(): __builtin_expect(, 1)
(the first mapping is the only one that makes sense, and it is the one
that is used by the kernel currently.)
The goal of mixing up the probability mappings was to measure the full
'range' of code size impact that the kernel's explicit hints are
causing, versus the impact that GCC's own branch probability
heuristics are causing:
text data bss dec filename
12566383 1617840 1089536 15273759 vmlinux.expect=10 [==vanilla]
12460250 1617840 1089536 15167626 vmlinux.expect=01
12563332 1617840 1089536 15270708 vmlinux.expect=00
12463035 1617840 1089536 15170411 vmlinux.expect=11
12533382 1617840 1089536 15240758 vmlinux.no-expect
11923529 1617840 1089536 14630905 vmlinux.-fno-guess-branch-probability
[ This was done on a v4.1-rc7-ish vanilla -O2 kernel (no alignment
tweaks), using 'make defconfig' and 'make kvmconfig' on x86-64 to
turn it into a minimally bootable kernel. I used GCC 4.9.1. ]
the 'vmlinux.none' kernel is a vanilla kernel with all
__builtin_expect() hints removed: i.e. GCC heuristics are deciding all
branch probabilities in the kernel.
So the code size 'range' that the kernel's own probability hints are
moving in is around 0.4%:
- the 'vanilla' mapping is the largest (not unexpectedly)
- the 'inverse' mapping is the smallest, by 0.8%
- the 00 and 11 mappings are about mid range, 0.4% away from the
extremes
- the 'none' mapping is also mid-range, which too is somewhat
expected: GCC heuristics would pick only part of our hints.
But note how the 'no GCC heuristics at all' setting reduces size
brutally, by 5.4%.
So if we were able to do that, while keeping __builtin_expect(), and
used our own heuristics only via __builtin_expect(), then we could
still possibly expect a total code size shrinkage of around 5.0%:
-5.4% code size reduction that comes from removal of all hints,
+0.4% code size increase between the 'none' and the '10' mappings
in the experiment above.
Note that even if I apply all the align=1 patches,
-fno-guess-branch-probability on top of that still gives me 2.6% code
size savings:
text data bss dec filename
12566383 1617840 1089536 15273759 vmlinux.expect=10 [==vanilla]
11923529 1617840 1089536 14630905 vmlinux.-fno-guess-branch-probability
11903663 1617840 1089536 14611039 vmlinux.align=1
11646102 1617840 1089536 14353478 vmlinux.align=1+fno-guess-branch-probability
The smallest vmlinux has:
- about 41,000 functions ('tT' symbols in System.map)
- about 300,000 branch/jump instructions (all objdump -d asm mnemonics starting with 'j')
- about 165,000 function calls (all objdump -d asm mnemonics matching 'call')
- about 2,330,000 instructions (all objdump -d asm mnemonics)
With align=1, GCC's heuristics added about 1,200 new branches and
1,350 new function calls, 76,900 instructions, altogether +257,561
bytes of code:
# of x86 instructions
2549742 vmlinux.expect=10 [==vanilla]
2391069 vmlinux.-fno-guess-branch-probability
2411568 vmlinux.align=1
2334595 vmlinux.align=1+fno-guess-branch-probability
(For completeness, the patch generating the smallest kernel is
attached below.)
The takeaway: the code size savings above are 7.9%. Even if we
restored __builtin_expect() hints, we'd probably still see a combined
7.5% code shrinkage.
That would be a rather significant I$ win, with very little cost that
I can see!
Thanks,
Ingo
---
arch/x86/Makefile | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 5ba2d9ce82dc..a6d3feb90b97 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -77,10 +77,25 @@ else
KBUILD_AFLAGS += -m64
KBUILD_CFLAGS += -m64
+ # Pack jump targets tightly, don't align them to the default 16 bytes:
+ KBUILD_CFLAGS += -falign-jumps=1
+
+ # Pack functions tightly as well:
+ KBUILD_CFLAGS += -falign-functions=1
+
+ # Pack loops tightly as well:
+ KBUILD_CFLAGS += -falign-loops=1
+
# Don't autogenerate traditional x87 instructions
KBUILD_CFLAGS += $(call cc-option,-mno-80387)
KBUILD_CFLAGS += $(call cc-option,-mno-fp-ret-in-387)
+ #
+ # Don't guess branch probabilities, follow the code and unlikely()/likely() hints,
+ # which reduces vmlinux size by about 5.4%:
+ #
+ KBUILD_CFLAGS += -fno-guess-branch-probability
+
# Use -mpreferred-stack-boundary=3 if supported.
KBUILD_CFLAGS += $(call cc-option,-mpreferred-stack-boundary=3)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/