Performance Testing
===================
I've run some limited performance benchmarks:
First, a real-world benchmark that causes a lot of page table manipulation (and
therefore we would expect to see regression here if we are going to see it
anywhere); kernel compilation. It barely registers a change. Values are times,
so smaller is better. All relative to base-4k:
| | kern | kern | user | user | real | real |
| config | mean | stdev | mean | stdev | mean | stdev |
|-------------|---------|---------|---------|---------|---------|---------|
| base-4k | 0.0% | 1.1% | 0.0% | 0.3% | 0.0% | 0.3% |
| compile-4k | -0.2% | 1.1% | -0.2% | 0.3% | -0.1% | 0.3% |
| boot-4k | 0.1% | 1.0% | -0.3% | 0.2% | -0.2% | 0.2% |
The Speedometer JavaScript benchmark also shows no change. Values are runs per
min, so bigger is better. All relative to base-4k:
| config | mean | stdev |
|-------------|---------|---------|
| base-4k | 0.0% | 0.8% |
| compile-4k | 0.4% | 0.8% |
| boot-4k | 0.0% | 0.9% |
Finally, I've run some microbenchmarks known to stress page table manipulations
(originally from David Hildenbrand). The fork test maps/allocs 1G of anon
memory, then measures the cost of fork(). The munmap test maps/allocs 1G of anon
memory then measures the cost of munmap()ing it. The fork test is known to be
extremely sensitive to any changes that cause instructions to be aligned
differently in cachelines. When using this test for other changes, I've seen
double digit regressions for the slightest thing, so 12% regression on this test
is actually fairly good. This likely represents the extreme worst case for
regressions that will be observed across other microbenchmarks (famous last
words). Values are times, so smaller is better. All relative to base-4k: