On Thu, 2012-10-04 at 14:41 +0200, Avi Kivity wrote:On 10/04/2012 12:49 PM, Raghavendra K T wrote:On 10/03/2012 10:35 PM, Avi Kivity wrote:On 10/03/2012 02:22 PM, Raghavendra K T wrote:So I think it's worth trying again with ple_window of 20000-40000.
Hi Avi,
I ran different benchmarks increasing ple_window, and results does not
seem to be encouraging for increasing ple_window.
Thanks for testing! Comments below.
Results:
16 core PLE machine with 16 vcpu guest.
base kernel = 3.6-rc5 + ple handler optimization patch
base_pleopt_8k = base kernel + ple window = 8k
base_pleopt_16k = base kernel + ple window = 16k
base_pleopt_32k = base kernel + ple window = 32k
Percentage improvements of benchmarks w.r.t base_pleopt with
ple_window = 4096
base_pleopt_8k base_pleopt_16k base_pleopt_32k
-----------------------------------------------------------------
kernbench_1x -5.54915 -15.94529 -44.31562
kernbench_2x -7.89399 -17.75039 -37.73498
So, 44% degradation even with no overcommit? That's surprising.
Yes. Kernbench was run with #threads = #vcpu * 2 as usual. Is it
spending 8 times the original ple_window cycles for 16 vcpus
significant?
A PLE exit when not overcommitted cannot do any good, it is better to
spin in the guest rather that look for candidates on the host. In fact
when we benchmark we often disable PLE completely.
Agreed. However, I really do not understand why the kernbench regressed
with bigger ple_window. It should stay the same or improve. Raghu, do
you have perf data for the kernbench runs?