RE: [PATCH v2 0/5] virtio mmio specification enhancement
From: Pincus, Josh
Date: Mon Aug 03 2020 - 19:31:26 EST
Thank you for the reply.
Please see my inline response below.
From: Alex Bennée <alex.bennee@xxxxxxxxxx>
Sent: Friday, July 31, 2020 8:45 AM
To: Pincus, Josh <Josh.Pincus@xxxxxxxxxxxxx>
Cc: linux-kernel@xxxxxxxxxxxxxxx; zhabin@xxxxxxxxxxxxxxxxx; virtio-dev@xxxxxxxxxxxxxxxxxxxx; qemu-devel@xxxxxxxxxx
Subject: Re: [PATCH v2 0/5] virtio mmio specification enhancement
Pincus, Josh <Josh.Pincus@xxxxxxxxxxxxx> writes:
> We were looking into a similar enhancement for the Virt I/O MMIO transport and came across this project.
> This enhancement would be perfect for us.
So there is certainly an interest in optimising MMIO based virtio and the current read/ack cycle adds additional round trip time for any trap and emulate hypervisor. However I think there is some resistance to making MMIO a re-implementation of what PCI already gives us for "free".
I believe the current questions that need to be addressed are:
- Clear definitions in the spec on doorbells/notifications
The current virtio spec uses different terms in some places so it
would be nice to clarify the language and formalise what the
standard expects from transports w.r.t the capabilities of
notifications and doorbells.
[JP] The read/ack cycle not only adds to the round-trip time for any trap and emulate HYP, but it also precludes an environment where one might want to avoid emulation completely. We're interested in using the MMIO transport combined with an augmented device node in the DTB to have device features, reserved memory for queues, and specific MSI interrupts per queue conveyed to the guest statically. In this kind of restricted environment, negotiation for features might be completely disabled; you see what the device node describes and you either support those features or not. Likewise, the standard list of state machine transitions for communicating driver and device state would be skipped. A driver in a guest comes up, reads the device node info, uses the queues as described, and assigns the MSI vectors per queue and config-has-changed service. When an interrupt comes in, there's no need to ack it beyond the normal way in which one conveys an EOI to hardware. It also means that with one dedicated interrupt per queue we won't have to select the queue in question and test which one got updated. In short, we are experimenting with getting rid of the emulation if we can.
- Quantifying the memory foot-print difference between PCI/MMIO
PCI gives a lot for free including a discovery and IRQ model already
designed to handle MSI/MSI-X. There is a claim that this brings in a
lot of bloat but I think there was some debate around the numbers.
My rough initial experiment with a PCI and non-PCI build with
otherwise identical VIRTIO configs results in the following:
16:40:15 c.282% [alex@zen:~/l/l/builds] review/rpmb|… + ls -l arm64/vmlinux arm64.nopci/vmlinux
-rwxr-xr-x 1 alex alex 83914728 Jul 31 16:39 arm64.nopci/vmlinux*
-rwxr-xr-x 1 alex alex 86368080 Jul 31 16:33 arm64/vmlinux*
which certainly implies there could be a fair amount of headroom for
an MMIO version to implement some features. However I don't know if
it's fully apples to apples as there maybe unneeded PCI bloat that a
virtio-only kernel doesn't need.
[JP] Apropos of your subsequent email on this topic, the PCI bloat isn't terrible. The major stumbling block in our case is that we would like to see if there's a restricted model in which the emulation can be removed completely. Case in point: Virt I/O RPMsgs in OpenAMP only use the queues to transfer data back and forth. (Unless I'm mistaken?) We'd like to see if that model can be a bit more generalized so that other kinds of drivers can be constructed that similarly don't rely on emulation for handling interrupt read/ack, feature negotiation, queue selection, etc. Memory is mapped into the guest for queues and R/O device registers, interrupts are assigned in the DTB for each queue, and features are, essentially, non-negotiable.
What are the features you are most interested in?
[JP] See above. 😉 The restricted environment in question is for very simple applications that don't have any kind of PCI infrastructure and for virtual environments with no HYP or a very restricted HYP.
> Has there been any progress since Feb, 2020? It looks like the effort
> might have stalled?
I can't speak to the OP's but there is certainly interest from others that are not the original posters.
[JP] Maybe we can restart the thread/discussion and see where it goes from here.