Re: [PATCH v4 0/3] Compile-time stack frame pointer validation

From: Ingo Molnar
Date: Thu May 21 2015 - 08:12:31 EST



* Ingo Molnar <mingo@xxxxxxxxxx> wrote:

> Especially on modern x86 CPUs with stack engines (latest Intel and
> AMD CPUs) that keeps ESP updates out of the later stages of
> execution pipelines, going from RBP framepointers to direct ESP use
> is beneficial to performance and compresses I$ footprint as well:
>
> text data bss dec hex filename
> 12150606 2565544 1634304 16350454 f97cf6 linux-CONFIG_FRAME_POINTERS=n/vmlinux
> 13282884 2571744 1617920 17472548 10a9c24 linux-CONFIG_FRAME_POINTERS=y/vmlinux

Correction: I ran that with a 1-byte alignment patch still applied.

I reran all the numbers with the default 16-bytes alignment as well,
and the gap between framepointers and no-framepointers become smaller,
but the various trends and conclusions still hold.

Here are the updated numbers:

text data bss dec hex filename
13548564 2571744 1617920 17738228 10ea9f4 linux-CONFIG_FRAME_POINTERS=n/vmlinux
13797773 2571744 1617920 17987437 112776d linux-CONFIG_FRAME_POINTERS=y/vmlinux

> Here's the I$ cachemiss rate with the 'vfs-mix' workload that I used
> in the -falign-functions measuremenst gives this for
> CONFIG_FRAMEPOINTERS=y, on Intel Sandy Bridge (best of 9x10 runs):
>
> #
> # CONFIG_FRAMEPOINTERS=y
> #
> Performance counter stats for 'system wide' (10 runs):
>
> 728,328,347 L1-icache-load-misses ( +- 0.08% ) (100.00%)
> 11,891,931,664 instructions ( +- 0.00% )
> 300,023 context-switches ( +- 0.00% )
>
> 7.324048170 seconds time elapsed ( +- 0.09% )


Performance counter stats for 'system wide' (10 runs):

701,525,006 L1-icache-load-misses ( +- 0.06% ) (100.00%)
11,891,793,196 instructions ( +- 0.01% )
300,036 context-switches ( +- 0.00% )

7.354372294 seconds time elapsed ( +- 0.82% )

>
> ... and these are the I$ miss perf stats from running the same
> workload on a CONFIG_FRAMEPOINTERS=n kernel:
>
> #
> # CONFIG_FRAMEPOINTERS are not set
> #
> Performance counter stats for 'system wide' (10 runs):
>
> 687,758,078 L1-icache-load-misses ( +- 0.10% ) (100.00%)
> 10,984,908,013 instructions ( +- 0.01% )
> 300,021 context-switches ( +- 0.00% )
>
> 7.120867260 seconds time elapsed ( +- 0.29% )

Performance counter stats for 'system wide' (10 runs):

685,107,089 L1-icache-load-misses ( +- 0.08% ) (100.00%)
10,983,861,590 instructions ( +- 0.01% )
300,031 context-switches ( +- 0.00% )

7.120738452 seconds time elapsed ( +- 0.35% )

> So if we disable frame pointers, then on this workload:
>
> - the kernel text size is 9.3% smaller
> - the number of instructions executed went down by about 8.2%
> - the cachemiss rate went down by about 5.9%
> - performance went up by about 2.8%.

- the kernel text size is 1.8% smaller: with 16 bytes alignment
there's quite some extra free space the frame pointer code can
grow into, which reduces the size win.

- the number of instructions executed went down by about 8.2% (as
expected this is invariant of alignment.)

- the cachemiss rate went down by about 2.7%: this is a smaller
win again, partly because of the 'free space' 16-byte alignment
gives us.

- the best 'time elapsed' numbers out of 10 runs show a speedup of
2.0% - close to the 2.8% with 1-byte alignment.

> The speedup is actually even better than 2.8%, if you look at
> average execution time:
>
> linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.324048170 seconds time elapsed ( +- 0.09% )
> linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.470166715 seconds time elapsed ( +- 1.01% )
> linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.365047474 seconds time elapsed ( +- 0.25% )
> linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.828223324 seconds time elapsed ( +- 2.04% )
> linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.427164489 seconds time elapsed ( +- 0.70% )
> linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.385565350 seconds time elapsed ( +- 0.35% )
> linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.560782318 seconds time elapsed ( +- 1.68% )
> linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.399741309 seconds time elapsed ( +- 0.74% )
> linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.303746766 seconds time elapsed ( +- 0.04% )
>
> avg = 7.451609

linux-CONFIG_FRAME_POINTERS=y/res2.txt: 7.300875812 seconds time elapsed ( +- 0.17% )
linux-CONFIG_FRAME_POINTERS=y/res2.txt: 7.491652338 seconds time elapsed ( +- 1.33% )
linux-CONFIG_FRAME_POINTERS=y/res2.txt: 7.307877300 seconds time elapsed ( +- 0.20% )
linux-CONFIG_FRAME_POINTERS=y/res2.txt: 7.258946461 seconds time elapsed ( +- 0.23% )
linux-CONFIG_FRAME_POINTERS=y/res2.txt: 7.295113779 seconds time elapsed ( +- 0.30% )
linux-CONFIG_FRAME_POINTERS=y/res2.txt: 7.283375859 seconds time elapsed ( +- 0.21% )
linux-CONFIG_FRAME_POINTERS=y/res2.txt: 7.319320205 seconds time elapsed ( +- 0.38% )
linux-CONFIG_FRAME_POINTERS=y/res2.txt: 7.354372294 seconds time elapsed ( +- 0.82% )
linux-CONFIG_FRAME_POINTERS=y/res2.txt: 7.308955558 seconds time elapsed ( +- 0.26% )
linux-CONFIG_FRAME_POINTERS=y/res2.txt: 7.295267101 seconds time elapsed ( +- 0.26% )

avg=7.32

> linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.201498813 seconds time elapsed ( +- 0.86% )
> linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.120867260 seconds time elapsed ( +- 0.29% )
> linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.141642635 seconds time elapsed ( +- 0.15% )
> linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.217213506 seconds time elapsed ( +- 0.85% )
> linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.163046581 seconds time elapsed ( +- 0.56% )
> linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.128939439 seconds time elapsed ( +- 0.23% )
> linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.256172853 seconds time elapsed ( +- 0.82% )
> linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.122946768 seconds time elapsed ( +- 0.23% )
> linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.126018578 seconds time elapsed ( +- 0.18% )
>
> avg = 7.164260

linux-CONFIG_FRAME_POINTERS=n/res2.txt: 7.135061084 seconds time elapsed ( +- 0.39% )
linux-CONFIG_FRAME_POINTERS=n/res2.txt: 7.132738388 seconds time elapsed ( +- 0.34% )
linux-CONFIG_FRAME_POINTERS=n/res2.txt: 7.174334895 seconds time elapsed ( +- 0.32% )
linux-CONFIG_FRAME_POINTERS=n/res2.txt: 7.215143851 seconds time elapsed ( +- 0.71% )
linux-CONFIG_FRAME_POINTERS=n/res2.txt: 7.131166029 seconds time elapsed ( +- 0.19% )
linux-CONFIG_FRAME_POINTERS=n/res2.txt: 7.270427197 seconds time elapsed ( +- 1.22% )
linux-CONFIG_FRAME_POINTERS=n/res2.txt: 7.120738452 seconds time elapsed ( +- 0.35% )
linux-CONFIG_FRAME_POINTERS=n/res2.txt: 7.168856127 seconds time elapsed ( +- 0.27% )
linux-CONFIG_FRAME_POINTERS=n/res2.txt: 7.268637173 seconds time elapsed ( +- 1.28% )
linux-CONFIG_FRAME_POINTERS=n/res2.txt: 7.178431781 seconds time elapsed ( +- 0.32% )

avg=7.18

> Then with framepointers disabled this workload gets faster by 4.0%
> on average.

With 16-byte alignment the average gets faster by 2.8%.

The conclusions are unchanged:

> The average result is also pretty stable in the no-framepointers
> case, while it fluctuates more in the framepointers case. (and this
> is why the 'best runtime' favors the framepointers case - the
> average is closer to reality.)
>
> So the performance advantages of not doing framepointers is not
> something we can ignore IMHO: but obviously performance isn't
> everything - so if stack unwinding is unrobust, then we need and
> want frame pointers.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/