Re: Attaching qcow2 images to containers

From: Kirill Tkhai
Date: Mon Jul 25 2022 - 17:20:32 EST


Hi, Stefan,

sorry for the late reply. I missed your message since I don't use that email address anymore.
Sent a patch to fix the stale address in .mailmap.

On 18.05.2022 09:30, Stefan Hajnoczi wrote:
> Hi Kirill,
> I saw your "[PATCH 0/4] dm: Introduce dm-qcow2 driver to attach QCOW2
> files as block device" patch series:
> https://lore.kernel.org/linux-kernel/YkME5ZS2CpXuNmN6@xxxxxxxxxxxxx/T/
>
> There has been recent work in vDPA (VIRTIO Data Path Acceleration) to
> achieve similar functionality. The qemu-storage-daemon VDUSE export
> attaches a virtio-blk device to the host kernel and QEMU's qcow2
> implementation can be used:
> https://patchew.org/QEMU/20220504074051.90-1-xieyongji@xxxxxxxxxxxxx/
>
> A container can then access this virtio-blk device (/dev/vda). Note that
> the virtio-blk device is implemented in software using vDPA/VDUSE, there
> is no virtio-pci device.
>
> As a quick comparison with a dm-qcow2 target, this approach keeps the
> qcow2 code in QEMU userspace and can take advantage of QEMU block layer
> features (storage migration/mirroring/backup, snapshots, etc). On the
> other hand, it's likely to be more heavyweight because bounce buffers
> are required in VDUSE for security reasons, there is a separate
> userspace process involved, and there's the virtio_blk.ko driver and an
> emulated virtio-blk device involved.

The main idea is to reach the best performance and density. This is possible only,
if driver's hot path is implemented in kernel. Comparing to NBD the driver shows
much better results in these criteria.

This has a continuation, and I mean DAX here. IO handling with any userspace-based
implementation will be slower, than DAX in case of kernel-based implementation. So,
in my driver IO handling is done in kernel, while DAX support is a subject of
the future development.

And this driver and advantages of QEMU block layer are not mutually exclusive.
This driver *does not implement* snapshot or backup support, here is only hot-path
doing IO handling.

> Another similar feature that was recently added to QEMU is the
> qemu-storage-daemon FUSE export:
>
> $ qemu-storage-daemon \
> --blockdev file,filename=test.img,node-name=drive0 \
> --export fuse,node-name=drive0,id=fuse0,mountpoint=/tmp/foo
> $ ls -alF /tmp/foo
> -r--------. 1 me me 10737418240 May 18 07:22 /tmp/foo
>
> This exports a disk image as a file via FUSE. Programs can access it
> like a regular file and qemu-storage-daemon will do the qcow2 I/O on the
> underlying file.
>
> I wanted to mention these options for exposing qcow2 disk images to
> processes/containers on the host. Depending on your use cases they might
> be interesting. Performance comparisons against VDUSE and FUSE exports
> would be interesting since these new approaches seem to be replacing
> qemu-nbd.
>
> Can you share more about your use cases for the dm-qcow2 target? It
> could be useful for everyone I've CCed to be aware of various efforts in
> this area.

The use case is containers, and they are the requestor for high density.
The userspace-based implementation overhead will be noticeable on nodes
running a lot of containers (just because of any overhead is noticeable
there :)). Also, it's very useful to use the same disk image for VMs and
containers giving people to choose what they want in the moment.

Best wishes,
Kirill