Avi Kivity wrote:
On 08/18/2009 05:46 PM, Gregory Haskins wrote:This is for things like the setup of queue-pairs, and the transport of
Hopefully you can outline how it works. AFAICT, RDMA and kernel bypassCan you explain how vbus achieves RDMA?Both of these are still in development. Trying to stay true to the
I also don't see the connection to real time guests.
"release early and often" mantra, the core vbus technology is being
pushed now so it can be reviewed. Stay tuned for these other
developments.
will need device assignment. If you're bypassing the call into the host
kernel, it doesn't really matter how that call is made, does it?
door-bells, and ib-verbs. I am not on the team doing that work, so I am
not an expert in this area. What I do know is having a flexible and
low-latency signal-path was deemed a key requirement.
For real-time, a big part of it is relaying the guest scheduler state to
the host, but in a smart way. For instance, the cpu priority for each
vcpu is in a shared-table. When the priority is raised, we can simply
update the table without taking a VMEXIT. When it is lowered, we need
to inform the host of the change in case the underlying task needs to
reschedule.
This is where the really fast call() type mechanism is important.
Its also about having the priority flow-end to end, and having the vcpu
interrupt state affect the task-priority, etc (e.g. pending interrupts
affect the vcpu task prio).
etc, etc.
I can go on and on (as you know ;), but will wait till this work is more
concrete and proven.
Basically, what it comes down to is both vbus and vhost need
configuration/management. Vbus does it with sysfs/configfs, and vhost
does it with ioctls. I ultimately decided to go with sysfs/configfs
because, at least that the time I looked, it seemed like the "blessed"
way to do user->kernel interfaces.
They need to be connected to the real world somehow. What aboutToday it has to be root as a result of weak mode support in configfs, so
security? can any user create a container and devices and link them to
real interfaces? If not, do you need to run the VM as root?
you have me there. I am looking for help patching this limitation, though.
I hope everyone agrees that it's an important issue for me and that II didn't mean non-Linux guests are not important. I was disagreeing
have to consider non-Linux guests. I also hope that you're considering
non-Linux guests since they have considerable market share.
with your assertion that it only works if its PCI. There are numerous
examples of IHV/ISV "bridge" implementations deployed in Windows, no?
If vbus is exposed as a PCI-BRIDGE, how is this different?
Given I'm not the gateway to inclusion of vbus/venet, you don't need toAgreed, and I didn't mean to suggest otherwise. It not clear if you are
ask me anything. I'm still free to give my opinion.
wearing the "kvm maintainer" hat, or the "lkml community member" hat at
times, so its important to make that distinction. Otherwise, its not
clear if this is edict as my superior, or input as my peer. ;)
With virtio, the number is 1 (or less if you amortize). Set up the ringAgain, I am just talking about basic PCI here, not the things we build
entries and kick.
on top.
The point is: the things we build on top have costs associated with
them, and I aim to minimize it. For instance, to do a "call()" kind of
interface, you generally need to pre-setup some per-cpu mappings so that
you can just do a single iowrite32() to kick the call off. Those
per-cpu mappings have a cost if you want them to be high-performance, so
my argument is that you ideally want to limit the number of times you
have to do this. My current design reduces this to "once".
There's no such thing as raw PCI. Every PCI device has a protocol. TheAnd its a question of how that protocol scales, more than how the
protocol virtio chose is optimized for virtualization.
protocol works.
Obviously the general idea of the protocol works, as vbus itself is
implemented as a PCI-BRIDGE and is therefore limited to the underlying
characteristics that I can get out of PCI (like PIO latency).
As I've mentioned before, prioritization is available on x86But as Ive mentioned, it doesn't work very well.
, and coalescing scales badly.Depends on what is scaling. Scaling vcpus? Yes, you are right.
Scaling the number of devices? No, this is where it improves.
irq window exits ought to be pretty rare, so we're only left with1us is too much for what I am building, IMHO.
injection vmexits. At around 1us/vmexit, even 100,000 interrupts/vcpu
(which is excessive) will only cost you 10% cpu time.
You're free to demultiplex an MSI to however many consumers you want,Hmmm...can you elaborate?
there's no need for a new bus for that.
Do you use DNS. We use PCI-SIG. If Novell is a PCI-SIG member you canYeah, we have our own id. I am more concerned about making this design
get a vendor ID and control your own virtio space.
make sense outside of PCI oriented environments.
vcpu increases, I agree (and am ok with, as I expect low vcpu countThat's a bug, not a feature. It means poor scaling as the number of
vcpus increases and as the number of devices increases.
machines to be typical).
nr of devices, I disagree. can you elaborate?
Windows,Work in progress.
large guestsCan you elaborate? I am not familiar with the term.
and multiqueue out of your design.AFAICT, multiqueue should work quite nicely with vbus. Can you
elaborate on where you see the problem?
Yeah, it doesn't really work well. Its an extremely rigid model thatI haven't, but Windows does.x86 APIC is priority aware.Have you ever tried to use it?
(IIRC) only lets you prioritize in 16 groups spaced by IDT (0-15 are one
level, 16-31 are another, etc). Most of the embedded PICs I have worked
with supported direct remapping, etc. But in any case, Linux doesn't
support it so we are hosed no matter how good it is.
More importantly, they had to build back-end busses too, no?They had to build connectors just like you propose to do.
But you still need vbus-connector-lguest and vbus-connector-s390 becauseThe fact that they don't need to redo most of the in-kernel backend
they all talk to the host differently. So what's changed? the names?
stuff. Just the connector.
Well, venet doesn't complement virtio-net, and virtio-pci doesn'tAgreed, but virtio complements vbus by virtue of virtio-vbus.
complement vbus-connector.