RE: [RFC, PATCH 0/24] VMI i386 Linux virtualization interface proposal

From: Anne Holler
Date: Mon Mar 20 2006 - 17:05:06 EST


[Apologies for resend: earlier email with html attachments was
rejected. Resending with txt attachments.]

>From: Zachary Amsden [mailto:zach@xxxxxxxxxx]
>Sent: Monday, March 13, 2006 9:58 AM

>In OLS 2005, we described the work that we have been doing in VMware
>with respect a common interface for paravirtualization of Linux. We
>shared the general vision in Rik's virtualization BoF.

>This note is an update on our further work on the Virtual Machine
>Interface, VMI. The patches provided have been tested on 2.6.16-rc6.
>We are currently recollecting performance information for the new -rc6
>kernel, but expect our numbers to match previous results, which showed
>no impact whatsoever on macro benchmarks, and nearly neglible impact
>on microbenchmarks.

Folks,

I'm a member of the performance team at VMware & I recently did a
round of testing measuring the performance of a set of benchmarks
on the following 2 linux variants, both running natively:
1) 2.6.16-rc6 including VMI + 64MB hole
2) 2.6.16-rc6 not including VMI + no 64MB hole
The intent was to measure the overhead of VMI calls on native runs.
Data was collected on both p4 & opteron boxes. The workloads used
were dbench/1client, netperf/receive+send, UP+SMP kernel compile,
lmbench, & some VMware in-house kernel microbenchmarks. The CPU(s)
were pegged for all workloads except netperf, for which I include
CPU utilization measurements.

Attached please find a text file presenting the benchmark results
collected in terms of ratio of 1) to 2), along with the raw scores
given in brackets. System configurations & benchmark descriptions
are given at the end of the page; more details are available on
request. Also attached for reference is a text file giving the
width of the 95% confidence interval around the mean of the scores
reported for each benchmark, expressed as a percentage of the mean.

The VMI-Native & Native scores for almost all workloads match
within the 95% confidence interval. On the P4, only 4 workloads,
all lmbench microbenchmarks (forkproc,shproc,mmap,pagefault) were
outside the interval & the overheads (2%,1%,2%,1%, respectively)
were low. The opteron microbenchmark data was a little more
ragged than the P4 in terms of variance, but it appears that only
a few lmbench microbenchmarks (forkproc,execproc,shproc) were
outside their confidence intervals and they show low overheads
(4%,3%,2%, respectively); our in-house segv & divzero seemed to
show measureable overheads as well (8%,9%).

-Regards, Anne Holler (anne@xxxxxxxxxx)
2.6.16-rc6 Transparent Paravirtualization Performance Scoreboard2.6.16-rc6 Transparent Paravirtualization Performance Scoreboard
Updated: 03/20/2006 * Contact: Anne Holler (anne@xxxxxxxxxx)

Throughput benchmarks -> HIGHER IS BETTER -> Higher ratio is better
P4 Opteron
VMI-Native/Native VMI-Native/Native Comments
Dbench
1client 1.00 [312/311] 1.00 [425/425]
Netperf
Receive 1.00 [948/947] 1.00 [937/937] CpuUtil:P4(VMI:43%,Ntv:42%);Opteron(VMI:36%,Ntv:34%)
Send 1.00 [939/939] 1.00 [937/936] CpuUtil:P4(VMI:25%,Ntv:25%);Opteron(VMI:62%,Ntv:60%)

Latency benchmarks -> LOWER IS BETTER -> Lower ratio is better
P4 Opteron
VMI-Native/Native VMI-Native/Native Comments
Kernel compile
UP 1.00 [221/220] 1.00 [131/131]
SMP/2way 1.00 [117/117] 1.00 [67/67]
Lmbench process time latencies
null call 1.00 [0.17/0.17] 1.00 [0.08/0.08]
null i/o 1.00 [0.29/0.29] 0.92 [0.23/0.25] opteron: wide confidence interval
stat 0.99 [2.14/2.16] 0.94 [2.25/2.39] opteron: odd, 1% outside wide confidence interval
open clos 1.01 [3.00/2.96] 0.98 [3.16/3.24]
slct TCP 1.00 [8.84/8.83] 0.94 [11.8/12.5] opteron: wide confidence interval
sig inst 0.99 [0.68/0.69] 1.09 [0.36/0.33] opteron: best is 1.03 [0.34/0.33]
sig hndl 0.99 [2.19/2.21] 1.05 [1.20/1.14] opteron: best is 1.02 [1.13/1.11]
fork proc 1.02 [137/134] 1.04 [100/96]
exec proc 1.02 [536/525] 1.03 [309/301]
sh proc 1.01 [3204/3169] 1.02 [1551/1528]
Lmbench context switch time latencies
2p/0K 1.00 [2.84/2.84] 1.14 [0.74/0.65] opteron: wide confidence interval
2p/16K 1.01 [2.98/2.95] 0.93 [0.74/0.80] opteron: wide confidence interval
2p/64K 1.02 [3.06/3.01] 1.00 [4.19/4.18]
8p/16K 1.02 [3.31/3.26] 0.97 [1.86/1.91]
8p/64K 1.01 [30.4/30.0] 1.00 [4.33/4.34]
16p/16K 0.96 [7.76/8.06] 0.97 [2.03/2.10]
16p/64K 1.00 [41.5/41.4] 1.00 [15.9/15.9]
Lmbench system latencies
Mmap 1.02 [6681/6542] 1.00 [3452/3441]
Prot Fault 1.06 [0.920/0.872] 1.07 [0.197/0.184] p4+opteron: wide confidence interval
Page Fault 1.01 [2.065/2.050] 1.00 [1.10/1.10]
Kernel Microbenchmarks
getppid 1.00 [1.70/1.70] 1.00 [0.83/0.83]
segv 0.99 [7.05/7.09] 1.08 [2.95/2.72]
forkwaitn 1.02 [3.60/3.54] 1.05 [2.61/2.48]
divzero 0.99 [5.68/5.73] 1.09 [2.71/2.48]

System Configurations:
P4: CPU: 2.4GHz; MEM: 1024MB; DISK: 10K SCSI; Server+Client NICs: Intel e1000 server adapter
Opteron: CPU: 2.2Ghz; MEM: 1024MB; DISK: 10K SCSI; Server+Client NICs: Broadcom NetXtreme BCM5704
UP kernel used for all workloads except SMP kernel compile

Benchmark Descriptions:
Dbench: repeat N times until 95% confidence interval 5% around mean; report mean
version 2.0 run as "time ./dbench -c client_plain.txt 1"
Netperf: best of 5 runs
MessageSize:8192+SocketSize:65536; netperf -H client-ip -l 60 -t TCP_STREAM
Kernel compile: best of 3 runs
Build of 2.6.11 kernel w/gcc 4.0.2 via "time make -j 16 bzImage"
Lmbench: average of best 18 of 30 runs
version 3.0-a4; obtained from sourceforge
Kernel microbenchmarks: average of best 3 of 5 runs
getppid: loop of 10 calls to getppid, repeated 1,000,000 times
segv: signal of SIGSEGV, repeated 3,000,000 times
forkwaitn: fork/wait for child to exit, repeated 40,000 times
divzero: divide by 0 fault 3,000,000 times
2.6.16-rc6 Transparent Paravirtualization Performance Confidence Interval Widths2.6.16-rc6 Transparent Paravirtualization Performance Confidence Interval Widths
Updated: 03/20/2006 * Contact: Anne Holler (anne@xxxxxxxxxx)
Values are 95% confidence interval width around mean given in terms of percentage of mean

P4 Opteron
Native VMI-Native Native VMI-Native
Dbench2.0
1client 5.0% 1.4% 0.8% 3.6%
Netperf
Receive 0.1% 0.0% 0.0% 0.0%
Send 0.6% 1.8% 0.0% 0.0%
Kernel compile
UP 3.4% 2.6% 2.2% 0.0%
SMP/2way 2.4% 4.9% 4.3% 4.2%
Lmbench process time latencies
null call 0.0% 0.0% 0.0% 0.0%
null i/o 0.0% 0.0% 5.2% 10.8%
stat 1.0% 1.0% 1.7% 3.2%
open clos 1.3% 0.7% 2.4% 3.0%
slct TCP 0.3% 0.3% 19.9% 20.1%
sig inst 0.3% 0.5% 0.0% 5.5%
sig hndl 0.4% 0.4% 2.0% 2.0%
fork proc 0.5% 0.9% 0.8% 1.0%
exec proc 0.8% 0.9% 1.0% 0.7%
sh proc 0.1% 0.2% 0.9% 0.4%
Lmbench context switch time latencies
2p/0K 0.8% 1.8% 16.1% 9.9%
2p/16K 1.5% 1.8% 10.5% 10.1%
2p/64K 2.4% 3.0% 1.8% 1.4%
8p/16K 4.5% 4.2% 2.4% 4.2%
8p/64K 3.0% 2.8% 1.6% 1.5%
16p/16K 3.1% 6.7% 2.6% 3.2%
16p/64K 0.5% 0.5% 2.9% 2.9%
Lmbench system latencies
Mmap 0.7% 0.3% 2.2% 2.4%
Prot Fault 7.4% 7.5% 49.4% 38.7%
Page Fault 0.2% 0.2% 2.4% 2.9%
Kernel Microbenchmarks
getppid 1.7% 2.9% 3.5% 3.5%
segv 2.3% 0.7% 1.8% 1.9%
forkwaitn 0.8% 0.8% 5.3% 2.2%
divzero 0.9% 1.3% 1.2% 1.1%