Re: [Xen-devel] Re: [PATCH v3] xen block backend.

From: Konrad Rzeszutek Wilk
Date: Wed Apr 27 2011 - 18:08:15 EST


On Thu, Apr 21, 2011 at 04:04:12AM -0400, Christoph Hellwig wrote:
> On Thu, Apr 21, 2011 at 08:28:45AM +0100, Ian Campbell wrote:
> > On Thu, 2011-04-21 at 04:37 +0100, Christoph Hellwig wrote:
> > > This should sit in userspace. And last time was discussed the issue
> > > Stefano said the qemu Xen disk backend is just as fast as this kernel
> > > code. And that's with an not even very optimized codebase yet.
> >
> > Stefano was comparing qdisk to blktap. This patch is blkback which is a
> > completely in-kernel driver which exports raw block devices to guests,
> > e.g. it's very useful in conjunction with LVM, iSCSI, etc. The last
> > measurements I heard was that qdisk was around 15% down compared to
> > blkback.
>
> Please show real numbers on why adding this to kernel space is required.

First off, many thanks go out to Alyssa Wilk and Vivek Goyal.

Alyssa for cluing me on the CPU banding problem (on the first machine I was
doing the testing I hit the CPU ceiling and had quite skewed results).
Vivek for helping me figure out why the kernel blkback was sucking when a READ
request got added on the stream of WRITEs with CFQ scheduler (I did not the
REQ_SYNC on the WRITE request).

The setup is as follow:

iSCSI target - running Linux v2.6.39-rc4 with TCM LIO-4.1 patches (which
provide iSCSI and Fibre target support) [1]. I export this 10GB RAMdisk over
a 1GB network connection.

iSCSI initiator - Sandy Bridge i3-2100 3.1GHz w/8GB, runs v2.6.39-rc4
with pv-ops patches [2]. Either 32-bit or 64-bit, and with Xen-unstable
(c/s 23246), Xen QEMU (e073e69457b4d99b6da0b6536296e3498f7f6599) with
one patch to enable aio [3]. Upstream QEMU version is quite close to this
one (it has a bug-fix in it). Memory limited to Dom0/DomU to 2GB.
I boot of PXE and run everything from the ramdisk.

The kernel/initramfs that I am using for this testing is the same
throughout and is based off VirtualIron's build system [4].

There are two tests, each test is run three times.

The first is random writes of 64K across the disk with four threads
doing this pounding. The results are in the 'randw-bw.png' file.

The second is based off IOMeter - it does random reads (20%) and writes
(80%), with various byte sizes : from 512 bytes up to 64K - two threads
doing it. The results are in the 'iometer-bw.png' file.

Attached is also the 'write' and 'iometer' fio files I used.

The guest config files are quite simple. They look as so:

kernel="/mnt/lab/latest/vmlinuz"
ramdisk="/mnt/lab/latest/initramfs.cpio.gz"
extra="console=hvc0 debug earlyprintk=xenboot"
memory=2048
maxmem=2048
vcpus=2
name="phy-xvda"
on_crash="preserve"
vif = [ 'bridge=switch' ]
vfb = [ 'vnc=1, vnclisten=0.0.0.0,vncunused=1']
disk = [ 'phy:/dev/sdb,xvda,w']

or to use QEMU qdisk:

kernel="/mnt/lab/latest/vmlinuz"
ramdisk="/mnt/lab/latest/initramfs.cpio.gz"
extra="console=hvc0 debug earlyprintk=xenboot"
memory=2048
maxmem=2048
vcpus=2
name="qdisk-xvda"
on_crash="preserve"
vif = [ 'bridge=switch' ]
vfb = [ 'vnc=1, vnclisten=0.0.0.0,vncunused=1']
disk = [ 'file:/dev/sdb,xvda,w']

/dev/sdb is naturally the LIO TCM RAMDISK.

[1]: git://git.kernel.org/pub/scm/linux/kernel/git/nab/lio-core-2.6.git #lio-4.1
[2]: git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git #devel/next-2.6.39
[3]: http://darnok.org/xen/qdisk_vs_blkback_v3.1/qemu-enable-aio.patch
[4]: git://xenbits.xensource.com/xentesttools/bootstrap.git

Attachment: iometer-bw.png
Description: PNG image

Attachment: randw-bw.png
Description: PNG image

# This job file tries to mimic the Intel IOMeter File Server Access Pattern
[global]
description=Emulation of Intel IOmeter File Server Access Pattern
numjobs=2

[/dev/xvda]
bssplit=512/10:1k/5:2k/5:4k/60:8k/2:16k/4:32k/4:64k/10
rw=randrw
rwmixread=80
direct=1
size=4g
ioengine=libaio
# IOMeter defines the server loads as the following:
# iodepth=1 Linear
# iodepth=4 Very Light
# iodepth=8 Light
# iodepth=64 Moderate
# iodepth=256 Heavy
iodepth=256
write_bw_log=iometer
write_lat_log=iometer
# This job file tries to mimic the Intel IOMeter File Server Access Pattern
[global]
description=Emulation of Intel IOmeter File Server Access Pattern
numjobs=4

[/dev/xvda]
bs=64k
rw=randw
direct=1
size=4g
ioengine=libaio
# IOMeter defines the server loads as the following:
# iodepth=1 Linear
# iodepth=4 Very Light
# iodepth=8 Light
# iodepth=64 Moderate
# iodepth=256 Heavy
iodepth=256
write_bw_log=randw
write_lat_log=randw