Re: ublk-qcow2: ublk-qcow2 is available

From: Stefan Hajnoczi
Date: Mon Oct 03 2022 - 17:36:17 EST

On Fri, Sep 30, 2022 at 05:24:11PM +0800, Ming Lei wrote:
> ublk-qcow2 is available now.

Cool, thanks for sharing!

> So far it provides basic read/write function, and compression and snapshot
> aren't supported yet. The target/backend implementation is completely
> based on io_uring, and share the same io_uring with ublk IO command
> handler, just like what ublk-loop does.
> Follows the main motivations of ublk-qcow2:
> - building one complicated target from scratch helps libublksrv APIs/functions
> become mature/stable more quickly, since qcow2 is complicated and needs more
> requirement from libublksrv compared with other simple ones(loop, null)
> - there are several attempts of implementing qcow2 driver in kernel, such as
> ``qloop`` [2], ``dm-qcow2`` [3] and ``in kernel qcow2(ro)`` [4], so ublk-qcow2
> might useful be for covering requirement in this field
> - performance comparison with qemu-nbd, and it was my 1st thought to evaluate
> performance of ublk/io_uring backend by writing one ublk-qcow2 since ublksrv
> is started
> - help to abstract common building block or design pattern for writing new ublk
> target/backend
> So far it basically passes xfstest(XFS) test by using ublk-qcow2 block
> device as TEST_DEV, and kernel building workload is verified too. Also
> soft update approach is applied in meta flushing, and meta data
> integrity is guaranteed, 'make test T=qcow2/040' covers this kind of
> test, and only cluster leak is reported during this test.
> The performance data looks much better compared with qemu-nbd, see
> details in commit log[1], README[5] and STATUS[6]. And the test covers both
> empty image and pre-allocated image, for example of pre-allocated qcow2
> image(8GB):
> - qemu-nbd (make test T=qcow2/002)

Single queue?

> randwrite(4k): jobs 1, iops 24605
> randread(4k): jobs 1, iops 30938
> randrw(4k): jobs 1, iops read 13981 write 14001
> rw(512k): jobs 1, iops read 724 write 728

Please try qemu-storage-daemon's VDUSE export type as well. The
command-line should be similar to this:

# modprobe virtio_vdpa # attaches vDPA devices to host kernel
# modprobe vduse
# qemu-storage-daemon \
--blockdev file,filename=test.qcow2,|off,aio=native,node-name=file \
--blockdev qcow2,file=file,node-name=qcow2 \
--object iothread,id=iothread0 \
--export vduse-blk,id=vduse0,name=vduse0,num-queues=$(nproc),node-name=qcow2,writable=on,iothread=iothread0
# vdpa dev add name vduse0 mgmtdev vduse

A virtio-blk device should appear and xfstests can be run on it
(typically /dev/vda unless you already have other virtio-blk devices).

Afterwards you can destroy the device using:

# vdpa dev del vduse0

> - ublk-qcow2 (make test T=qcow2/022)

There are a lot of other factors not directly related to NBD vs ublk. In
order to get an apples-to-apples comparison with qemu-* a ublk export
type is needed in qemu-storage-daemon. That way only the difference is
the ublk interface and the rest of the code path is identical, making it
possible to compare NBD, VDUSE, ublk, etc more precisely.

I think that comparison is interesting before comparing different qcow2
implementations because qcow2 sits on top of too much other code. It's
hard to know what should be accounted to configuration differences,
implementation differences, or fundamental differences that cannot be
overcome (this is the interesting part!).

> randwrite(4k): jobs 1, iops 104481
> randread(4k): jobs 1, iops 114937
> randrw(4k): jobs 1, iops read 53630 write 53577
> rw(512k): jobs 1, iops read 1412 write 1423
> Also ublk-qcow2 aligns queue's chunk_sectors limit with qcow2's cluster size,
> which is 64KB at default, this way simplifies backend io handling, but
> it could be increased to 512K or more proper size for improving sequential
> IO perf, just need one coroutine to handle more than one IOs.
> [1]
> [2]
> [3]
> [4]
> [5]
> [6]
> Thanks,
> Ming

Attachment: signature.asc
Description: PGP signature