Re: [PATCH v1 2/2] virtio-mmio: add features for virtio-mmio specification version 3

From: Liu, Jiang
Date: Fri Jan 03 2020 - 01:51:27 EST


On Jan 2, 2020, at 2:28 PM, Jason Wang <jasowang@xxxxxxxxxx> wrote:
>
>
> On 2019/12/26 äå9:16, Liu, Jiang wrote:
>>
>>> On Dec 26, 2019, at 4:09 PM, Jason Wang <jasowang@xxxxxxxxxx> wrote:
>>>
>>>
>>> On 2019/12/25 äå11:20, Liu, Jiang wrote:
>>>>> On Dec 25, 2019, at 6:20 PM, Jason Wang <jasowang@xxxxxxxxxx> wrote:
>>>>>
>>>>>
>>>>> On 2019/12/25 äå10:50, Zha Bin wrote:
>>>>>> From: Liu Jiang <gerry@xxxxxxxxxxxxxxxxx>
>>>>>>
>>>>>> Userspace VMMs (e.g. Qemu microvm, Firecracker) take advantage of using
>>>>>> virtio over mmio devices as a lightweight machine model for modern
>>>>>> cloud. The standard virtio over MMIO transport layer only supports one
>>>>>> legacy interrupt, which is much heavier than virtio over PCI transport
>>>>>> layer using MSI. Legacy interrupt has long work path and causes specific
>>>>>> VMExits in following cases, which would considerably slow down the
>>>>>> performance:
>>>>>>
>>>>>> 1) read interrupt status register
>>>>>> 2) update interrupt status register
>>>>>> 3) write IOAPIC EOI register
>>>>>>
>>>>>> We proposed to update virtio over MMIO to version 3[1] to add the
>>>>>> following new features and enhance the performance.
>>>>>>
>>>>>> 1) Support Message Signaled Interrupt(MSI), which increases the
>>>>>> interrupt performance for virtio multi-queue devices
>>>>>> 2) Support per-queue doorbell, so the guest kernel may directly write
>>>>>> to the doorbells provided by virtio devices.
>>>>>>
>>>>>> The following is the network tcp_rr performance testing report, tested
>>>>>> with virtio-pci device, vanilla virtio-mmio device and patched
>>>>>> virtio-mmio device (run test 3 times for each case):
>>>>>>
>>>>>> netperf -t TCP_RR -H 192.168.1.36 -l 30 -- -r 32,1024
>>>>>>
>>>>>> Virtio-PCI Virtio-MMIO Virtio-MMIO(MSI)
>>>>>> trans/s 9536 6939 9500
>>>>>> trans/s 9734 7029 9749
>>>>>> trans/s 9894 7095 9318
>>>>>>
>>>>>> [1] https://lkml.org/lkml/2019/12/20/113
>>>>> Thanks for the patch. Two questions after a quick glance:
>>>>>
>>>>> 1) In PCI we choose to support MSI-X instead of MSI for having extra flexibility like alias, independent data and address (e.g for affinity) . Any reason for not start from MSI-X? E.g having MSI-X table and PBA (both of which looks pretty independent).
>>>> Hi Jason,
>>>> Thanks for reviewing patches on Christmas Day:)
>>>> The PCI MSI-x has several advantages over PCI MSI, mainly
>>>> 1) support 2048 vectors, much more than 32 vectors supported by MSI.
>>>> 2) dedicated address/data for each vector,
>>>> 3) per vector mask/pending bits.
>>>> The proposed MMIO MSI extension supports both 1) and 2),
>>>
>>> Aha right, I mis-read the patch. But more questions comes:
>>>
>>> 1) The association between vq and MSI-X vector is fixed. This means it can't work for a device that have more than 2047 queues. We probably need something similar to virtio-pci to allow a dynamic association.
>> We have considered both the PCI MSI-x like dynamic association design and fix mapping design.
>> The fix mapping design simplifies both the interrupt configuration process and VMM implementations.
>
>
> Well, for VMM just an indirection and for guest, it can choose to use fixed mapping, just need to program once during probe.
>
>
>> And the virtio mmio transport layer is mainly used by light VMMs to support small scale virtual machines,
>
>
> Let's not limit the interface to be used by a specific case :). Eliminating PCIE would be appealing for other scenarios.
>
>
>> 2048 vectors should be enough for these usage cases.
>> So the fix mapping design has been used.
>>
>>> 2) The mask and unmask control is missed
>>>
>>>
>>>> but the extension doesnât support 3) because
>>>> we noticed that the Linux virtio subsystem doesnât really make use of interrupt masking/unmasking.
>>>
>>> Not directly used but masking/unmasking is widely used in irq subsystem which allows lots of optimizations.
>>>
>>>
>>>> On the other hand, we want to simplify VMM implementations as simple as possible, and mimicking the PCI MSI-x
>>>> will cause some complexity to VMM implementations.
>>>
>>> I agree to simplify VMM implementation, but it looks to me introducing masking/pending won't cost too much code in the VMM implementation. Just new type of command for VIRTIO_MMIO_MSI_COMMAND.
>> We want to make VMM implementations as simple as possible:)
>> And based on following observations, we have disabled support of mask/unmask,
>> 1) MSI is edge triggering, which means it wonât be shared with other interrupt sources,
>
>
> Is this true? I think the spec does not forbid such usages, e.g using the same MSI address/command for different queues or devices?
Yes, the spec doesnât forbid this.
We could share the same MSIx vector for multiple queues, rx/tax etc.
But we canât share a Linux MSI interrupt for different devices/MSIx vectors, this is an implementation constraint of the Linux interrupt subsystem.

>
>
>> so masking/unmasking wonât be used for normal interrupt management logic.
>> 2) Linux virtio mmio transport layer doesnât support suspend/resume yet, so thereâs no need to quiesce the device by masking interrupts.
>
>
> Yes, but it's a limitation only for virtio mmio transport. We can add it.
>
>
>> 3) The legacy PCI 2.2 devices doesnât support irq masking/unmasking, so irq masking/unmasking may be optional operations.
>
>
> Yes, but as you said, it helps for performance and some other cases. I still prefer to implement that consider it is not complex. If we do MSI without masking/unmasking, I suspect we will implement MSI-X finally somedays then maintaining MSI will become a burden... (still takes virtio-pci as an example, it choose to implement MSI-X not MSI).
>
>
>> So we skipped support of irq masking/unmasking. We will recheck whether irq masking/unmasking is mandatory for MMIO devices.
>> On the other hand, we may enhance the spec to define command codes for masking/unmasking, and VMM may optionally support masking/unmasking.
>
>
> Yes, thanks.
>
>
>>
>> Thanks,
>> Gerry
>>
>>> Thanks
>>>
>>>
>>>>> 2) It's better to split notify_multiplexer out of MSI support to ease the reviewers (apply to spec patch as well)
>>>> Great suggestion, we will try to split the patch.
>>>>
>>>> Thanks,
>>>> Gerry
>>>>
>>>>> Thanks