Re: virtio-blk: should num_vqs be limited by num_possible_cpus()?

From: Dongli Zhang
Date: Thu Mar 14 2019 - 12:08:35 EST

On 03/14/2019 08:13 PM, Cornelia Huck wrote:
> On Thu, 14 Mar 2019 14:12:32 +0800
> Dongli Zhang <dongli.zhang@xxxxxxxxxx> wrote:
>> On 3/13/19 5:39 PM, Cornelia Huck wrote:
>>> On Wed, 13 Mar 2019 11:26:04 +0800
>>> Dongli Zhang <dongli.zhang@xxxxxxxxxx> wrote:
>>>> On 3/13/19 1:33 AM, Cornelia Huck wrote:
>>>>> On Tue, 12 Mar 2019 10:22:46 -0700 (PDT)
>>>>> Dongli Zhang <dongli.zhang@xxxxxxxxxx> wrote:
>>>>>> Is this by design on purpose, or can we fix with below?
>>>>>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>>>>>> index 4bc083b..df95ce3 100644
>>>>>> --- a/drivers/block/virtio_blk.c
>>>>>> +++ b/drivers/block/virtio_blk.c
>>>>>> @@ -513,6 +513,8 @@ static int init_vq(struct virtio_blk *vblk)
>>>>>> if (err)
>>>>>> num_vqs = 1;
>>>>>> + num_vqs = min(num_possible_cpus(), num_vqs);
>>>>>> +
>>>>>> vblk->vqs = kmalloc_array(num_vqs, sizeof(*vblk->vqs), GFP_KERNEL);
>>>>>> if (!vblk->vqs)
>>>>>> return -ENOMEM;
>>>>> virtio-blk, however, is not pci-specific.
>>>>> If we are using the ccw transport on s390, a completely different
>>>>> interrupt mechanism is in use ('floating' interrupts, which are not
>>>>> per-cpu). A check like that should therefore not go into the generic
>>>>> driver.
>>>> So far there seems two options.
>>>> The 1st option is to ask the qemu user to always specify "-num-queues" with the
>>>> same number of vcpus when running x86 guest with pci for virtio-blk or
>>>> virtio-scsi, in order to assign a vector for each queue.
>>> That does seem like an extra burden for the user: IIUC, things work
>>> even if you have too many queues, it's just not optimal. It sounds like
>>> something that can be done by a management layer (e.g. libvirt), though.
>>>> Or, is it fine for virtio folks to add a new hook to 'struct virtio_config_ops'
>>>> so that different platforms (e.g., pci or ccw) would use different ways to limit
>>>> the max number of queues in guest, with something like below?
>>> That sounds better, as both transports and drivers can opt-in here.
>>> However, maybe it would be even better to try to come up with a better
>>> strategy of allocating msix vectors in virtio-pci. More vectors in the
>>> num_queues > num_cpus case, even if they still need to be shared?
>>> Individual vectors for n-1 cpus and then a shared one for the remaining
>>> queues?
>>> It might even be device-specific: Have some low-traffic status queues
>>> share a vector, and provide an individual vector for high-traffic
>>> queues. Would need some device<->transport interface, obviously.
>> This sounds a little bit similar to multiple hctx maps?
>> So far, as virtio-blk only supports set->nr_maps = 1, no matter how many hw
>> queues are assigned for virtio-blk, blk_mq_alloc_tag_set() would use at most
>> nr_cpu_ids hw queues.
>> 2981 int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
>> ... ...
>> 3021 /*
>> 3022 * There is no use for more h/w queues than cpus if we just have
>> 3023 * a single map
>> 3024 */
>> 3025 if (set->nr_maps == 1 && set->nr_hw_queues > nr_cpu_ids)
>> 3026 set->nr_hw_queues = nr_cpu_ids;
>> Even the block layer would limit the number of hw queues by nr_cpu_ids when
>> (set->nr_maps == 1).
> Correct me if I'm wrong, but there seem to be two kinds of limitations
> involved here:
> - Allocation of msix vectors by the virtio-pci transport. We end up
> with shared vectors if we have more virtqueues than vcpus. Other
> transports may or may not have similar issues, but essentially, this
> is something that applies to all kind of virtio devices attached via
> the virtio-pci transport.

It depends.

For virtio-net, we need to specify the number of available vectors on qemu side,

-device virtio-net-pci,netdev=tapnet,mq=true,vectors=16

This parameter is specific for virtio-net.

Suppose if 'queues=8' while 'vectors=16', as 2*8+1 > 16, there will be lack of
vectors and guest would not be able to assign one vector for each queue.

I was tortured by this long time ago and it seems qemu would minimize the memory
allocation and the default 'vectors' is 3.

BTW, why cannot we have a more consistent configuration for most qemu devices,
e.g., so far:

virtio-blk use 'num-queues'
nvme use 'num_queues'
virtio-net use 'queues' for tap :)

> - The block layer limits the number of hw queues to the number of
> vcpus. This applies only to virtio devices that interact with the
> block layer, but regardless of the virtio transport.

Yes: virtio-blk and virtio-scsi.

>> That's why I think virtio-blk should use the similar solution as nvme
>> (regardless about write_queues and poll_queues) and xen-blkfront.
> Ok, the hw queues limit from above would be an argument to limit to
> #vcpus in the virtio-blk driver, regardless of the transport used. (No
> idea if there are better ways to deal with this, I'm not familiar with
> the interface.)
> For virtio devices that don't interact with the block layer and are
> attached via the virtio-pci transport, it might still make sense to
> revisit vector allocation.

As mentioned above, we need to specify 'vectors' for virtio-net as the default
value is only 3 (config + tx + rx). That would make a little difference?

Dongli Zhang