Re: [PATCH v2 0/2] block: virtio-blk: support multi vq per virtio-blk

From: Ming Lei
Date: Thu Jun 26 2014 - 01:28:21 EST


On Thu, Jun 26, 2014 at 1:05 PM, Jens Axboe <axboe@xxxxxxxxx> wrote:
> On 2014-06-25 20:08, Ming Lei wrote:
>>
>> Hi,
>>
>> These patches try to support multi virtual queues(multi-vq) in one
>> virtio-blk device, and maps each virtual queue(vq) to blk-mq's
>> hardware queue.
>>
>> With this approach, both scalability and performance on virtio-blk
>> device can get improved.
>>
>> For verifying the improvement, I implements virtio-blk multi-vq over
>> qemu's dataplane feature, and both handling host notification
>> from each vq and processing host I/O are still kept in the per-device
>> iothread context, the change is based on qemu v2.0.0 release, and
>> can be accessed from below tree:
>>
>> git://kernel.ubuntu.com/ming/qemu.git #v2.0.0-virtblk-mq.1
>>
>> For enabling the multi-vq feature, 'num_queues=N' need to be added into
>> '-device virtio-blk-pci ...' of qemu command line, and suggest to pass
>> 'vectors=N+1' to keep one MSI irq vector per each vq, and the feature
>> depends on x-data-plane.
>>
>> Fio(libaio, randread, iodepth=64, bs=4K, jobs=N) is run inside VM to
>> verify the improvement.
>>
>> I just create a small quadcore VM and run fio inside the VM, and
>> num_queues of the virtio-blk device is set as 2, but looks the
>> improvement is still obvious.
>>
>> 1), about scalability
>> - without mutli-vq feature
>> -- jobs=2, thoughput: 145K iops
>> -- jobs=4, thoughput: 100K iops
>> - with mutli-vq feature
>> -- jobs=2, thoughput: 193K iops
>> -- jobs=4, thoughput: 202K iops
>>
>> 2), about thoughput
>> - without mutli-vq feature
>> -- thoughput: 145K iops
>> - with mutli-vq feature
>> -- thoughput: 202K iops
>
>
> Of these numbers, I think it's important to highlight that the 2 thread case
> is 33% faster and the 2 -> 4 thread case scales linearly (100%) while the
> pre-patch case sees negative scaling going from 2 -> 4 threads (-39%).

This is because my qemu implementation on multi vq only uses
single iothread to handle requests from all vqs, and the only iothread
is already at full load now, that said on host side the
same fio test(single job) results is ~200K iops too.

>
> I haven't run your patches yet, but from looking at the code, it looks good.
> It's pretty straightforward. See feel free to add my reviewed-by.

Thanks a lot.

>
> Rusty, do you want to ack this (and I'll slurp it up for 3.17) or take this
> yourself? Or something else?

That is great if this can be merged to 3.17.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/