On 3/13/19 5:39 PM, Cornelia Huck wrote:
On Wed, 13 Mar 2019 11:26:04 +0800This sounds a little bit similar to multiple hctx maps?
Dongli Zhang <dongli.zhang@xxxxxxxxxx> wrote:
On 3/13/19 1:33 AM, Cornelia Huck wrote:Ah, I meant that those are pci devices.
On Tue, 12 Mar 2019 10:22:46 -0700 (PDT)Not pci-specific, but per-cpu as well.
Dongli Zhang <dongli.zhang@xxxxxxxxxx> wrote:
I observed that there is one msix vector for config and one shared vectorPlease note that this is pci-specific...
for all queues in below qemu cmdline, when the num-queues for virtio-blk
is more than the number of possible cpus:
qemu: "-smp 4" while "-device virtio-blk-pci,drive=drive-0,id=virtblk0,num-queues=6"
# cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
... ...
24: 0 0 0 0 PCI-MSI 65536-edge virtio0-config
25: 0 0 0 59 PCI-MSI 65537-edge virtio0-virtqueues
... ...
However, when num-queues is the same as number of possible cpus:
qemu: "-smp 4" while "-device virtio-blk-pci,drive=drive-0,id=virtblk0,num-queues=4"
# cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
... ...
24: 0 0 0 0 PCI-MSI 65536-edge virtio0-config
25: 2 0 0 0 PCI-MSI 65537-edge virtio0-req.0
26: 0 35 0 0 PCI-MSI 65538-edge virtio0-req.1
27: 0 0 32 0 PCI-MSI 65539-edge virtio0-req.2
28: 0 0 0 0 PCI-MSI 65540-edge virtio0-req.3
... ...
In above case, there is one msix vector per queue.
...and these are probably pci-specific as well.
This is because the max number of queues is not limited by the number of
possible cpus.
By default, nvme (regardless about write_queues and poll_queues) and
xen-blkfront limit the number of queues with num_possible_cpus().
That does seem like an extra burden for the user: IIUC, things workSo far there seems two options.virtio-blk, however, is not pci-specific.
Is this by design on purpose, or can we fix with below?
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 4bc083b..df95ce3 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -513,6 +513,8 @@ static int init_vq(struct virtio_blk *vblk)
if (err)
num_vqs = 1;
+ num_vqs = min(num_possible_cpus(), num_vqs);
+
vblk->vqs = kmalloc_array(num_vqs, sizeof(*vblk->vqs), GFP_KERNEL);
if (!vblk->vqs)
return -ENOMEM;
If we are using the ccw transport on s390, a completely different
interrupt mechanism is in use ('floating' interrupts, which are not
per-cpu). A check like that should therefore not go into the generic
driver.
The 1st option is to ask the qemu user to always specify "-num-queues" with the
same number of vcpus when running x86 guest with pci for virtio-blk or
virtio-scsi, in order to assign a vector for each queue.
even if you have too many queues, it's just not optimal. It sounds like
something that can be done by a management layer (e.g. libvirt), though.
Or, is it fine for virtio folks to add a new hook to 'struct virtio_config_ops'That sounds better, as both transports and drivers can opt-in here.
so that different platforms (e.g., pci or ccw) would use different ways to limit
the max number of queues in guest, with something like below?
However, maybe it would be even better to try to come up with a better
strategy of allocating msix vectors in virtio-pci. More vectors in the
num_queues > num_cpus case, even if they still need to be shared?
Individual vectors for n-1 cpus and then a shared one for the remaining
queues?
It might even be device-specific: Have some low-traffic status queues
share a vector, and provide an individual vector for high-traffic
queues. Would need some device<->transport interface, obviously.
So far, as virtio-blk only supports set->nr_maps = 1, no matter how many hw
queues are assigned for virtio-blk, blk_mq_alloc_tag_set() would use at most
nr_cpu_ids hw queues.
2981 int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
... ...
3021 /*
3022 * There is no use for more h/w queues than cpus if we just have
3023 * a single map
3024 */
3025 if (set->nr_maps == 1 && set->nr_hw_queues > nr_cpu_ids)
3026 set->nr_hw_queues = nr_cpu_ids;
Even the block layer would limit the number of hw queues by nr_cpu_ids when
(set->nr_maps == 1).
That's why I think virtio-blk should use the similar solution as nvme
(regardless about write_queues and poll_queues) and xen-blkfront.
Added Jason again. I do not know why the mailing list of
virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx always filters out Jason's email...
Dongli Zhang