Can you explain how vbus achieves RDMA?Both of these are still in development. Trying to stay true to the
I also don't see the connection to real time guests.
"release early and often" mantra, the core vbus technology is being
pushed now so it can be reviewed. Stay tuned for these other developments.
Its just an example. The point is that I abstracted what I think areI also designed it in such a way thatSorry, I'm still confused. Why would openvz need vbus?
we could, in theory, write one set of (linux-based) backends, and have
them work across a variety of environments (such as containers/VMs like
KVM, lguest, openvz, but also physical systems like blade enclosures and
clusters, or even applications running on the host).
the key points of fast-io, memory routing, signal routing, etc, so that
it will work in a variety of (ideally, _any_) environments.
There may not be _performance_ motivations for certain classes of VMs
because they already have decent support, but they may want a connector
anyway to gain some of the new features available in vbus.
And looking forward, the idea is that we have commoditized the backend
so we don't need to redo this each time a new container comes along.
One point of contention is that this is all managementy stuff and shouldSee my last reply to Anthony. My two points here are that:
be kept out of the host kernel. Exposing shared memory, interrupts, and
guest hypercalls can all be easily done from userspace (as virtio
demonstrates). True, some devices need kernel acceleration, but that's
no reason to put everything into the host kernel.
a) having it in-kernel makes it a complete subsystem, which perhaps has
diminished value in kvm, but adds value in most other places that we are
looking to use vbus.
b) the in-kernel code is being overstated as "complex". We are not
talking about your typical virt thing, like an emulated ICH/PCI chipset.
Its really a simple list of devices with a handful of attributes. They
are managed using established linux interfaces, like sysfs/configfs.
Exposing devices as PCI is an important issue for me, as I have toThats your prerogative, but obviously not everyone agrees with you.
consider non-Linux guests.
Getting non-Linux guests to work is my problem if you chose to not be
part of the vbus community.
Another issue is the host kernel management code which I believe isIn your opinion, right?
superfluous.
Given that, why spread to a new model?Note: I haven't asked you to (at least, not since April with the vbus-v3
release). Spreading to a new model is currently the role of the
AlacrityVM project, since we disagree on the utility of a new model.
Yes, now walk me through how you set up DMA to do something like a callA) hardware can only generate byte/word sized requests at a time becauseNo, hardware happily DMAs to and fro main memory.
that is all the pcb-etch and silicon support. So hardware is usually
expressed in terms of some number of "registers".
when you do not know addresses apriori. Hint: count the number of
MMIO/PIOs you need. If the number is> 1, you've lost.
Some hardware ofNote we are not talking about virtio here. Just raw PCI and why I
course uses mmio registers extensively, but not virtio hardware. With
the recent MSI support no registers are touched in the fast path.
advocate vbus over it.
Yes, and to get one you have to do what? Register it with kvm.git,D) device-ids are in a fixed width register and centrally assigned fromThat's not an issue either. Qumranet/Red Hat has donated a range of
an authority (e.g. PCI-SIG).
device IDs for use in virtio.
right? Kind of like registering a MAJOR/MINOR, would you agree? Maybe
you do not mind (especially given your relationship to kvm.git), but
there are disadvantages to that model for most of the rest of us.
Device IDs are how devices are associatedNope, just like you don't need to do anything ahead of time for using a
with drivers, so you'll need something similar for vbus.
dynamic misc-device name. You just have both the driver and device know
what they are looking for (its part of the ABI).
There are no "interrupts" in vbus..only shm-signals. You can establishE) Interrupt/MSI routing is per-device orientedPlease elaborate. What is the issue? How does vbus solve it?
an arbitrary amount of shm regions, each with an optional shm-signal
associated with it. To do this, the driver calls dev->shm(), and you
get back a shm_signal object.
Underneath the hood, the vbus-connector (e.g. vbus-pcibridge) decides
how it maps real interrupts to shm-signals (on a system level, not per
device). This can be 1:1, or any other scheme. vbus-pcibridge uses one
system-wide interrupt per priority level (today this is 8 levels), each
with an IOQ based event channel. "signals" come as an event on that
channel.
So the "issue" is that you have no real choice with PCI. You just get
device oriented interrupts. With vbus, its abstracted. So you can
still get per-device standard MSI, or you can do fancier things like do
coalescing and prioritization.
Its all relative. IDT dispatch and EOI overhead are "baseline" on realF) Interrupts/MSI are assumed cheap to injectInterrupts are not assumed cheap; that's why interrupt mitigation is
used (on real and virtual hardware).
hardware, whereas they are significantly more expensive to do the
vmenters and vmexits on virt (and you have new exit causes, like
irq-windows, etc, that do not exist in real HW).
It doesn't work right. The x86 sense of interrupt priority is, sorry toG) Interrupts/MSI are non-priortizable.They are prioritizable; Linux ignores this though (Windows doesn't).
Please elaborate on what the problem is and how vbus solves it.
say it, half-assed at best. I've worked with embedded systems that have
real interrupt priority support in the hardware, end to end, including
the PIC. The LAPIC on the other hand is really weak in this dept, and
as you said, Linux doesn't even attempt to use whats there.
Some of the things we are building use the model of having a device thatH) Interrupts/MSI are statically establishedCan you give an example of why this is a problem?
hands out shm-signal in response to guest events (say, the creation of
an IPC channel). This would generally be handled by a specific device
model instance, and it would need to do this without pre-declaring the
MSI vectors (to use PCI as an example).
What performance oriented items have been left unaddressed?Well, the interrupt model to name one.
How do you handle conflicts? Again you need a central authority to handNot really, no. If you really wanted to be formal about it, you could
out names or prefixes.
adopt any series of UUID schemes. For instance, perhaps venet should be
"com.novell::virtual-ethernet". Heck, I could use uuidgen.
So the "avi-vbus-connector" can use 1:1, if you prefer. Large vcpuAs another example, the connector design coalesces *all* shm-signalsThat's a bug, not a feature. It means poor scaling as the number of
into a single interrupt (by prio) that uses the same context-switch
mitigation techniques that help boost things like networking. This
effectively means we can detect and optimize out ack/eoi cycles from the
APIC as the IO load increases (which is when you need it most). PCI has
no such concept.
vcpus increases and as the number of devices increases.
counts (which are not typical) and irq-affinity is not a target
application for my design, so I prefer the coalescing model in the
vbus-pcibridge included in this series. YMMV
Note nothing prevents steering multiple MSIs into a single vector. It'sYes, it is a bad idea...and not the same thing either. This would
a bad idea though.
effectively create a shared-line scenario in the irq code, which is not
what happens in vbus.
Have you ever tried to use it?In addition, the signals and interrupts are priority aware, which isx86 APIC is priority aware.
useful for things like 802.1p networking where you may establish 8-tx
and 8-rx queues for your virtio-net device. x86 APIC really has no
usable equivalent, so PCI is stuck here.
The connector I am pushing out does not have this limitation.Also, the signals can be allocated on-demand for implementing thingsYes. However given that vectors are a scarce resource you're severely
like IPC channels in response to guest requests since there is no
assumption about device-to-interrupt mappings. This is more flexible.
limited in that.
And if you're multiplexing everything on one vector,Only per-device, not system wide.
then you can just as well demultiplex your channels in the virtio driver
code.
Yes, and lguest and s390 had to build their own bus-model to do it, right?And through all of this, this design would work in any guest even if itThat is true for virtio which works on pci-less lguest and s390.
doesn't have PCI (e.g. lguest, UML, physical systems, etc).
Thank you for bringing this up, because it is one of the main points
here. What I am trying to do is generalize the bus to prevent the
proliferation of more of these isolated models in the future. Build
one, fast, in-kernel model so that we wouldn't need virtio-X, and
virtio-Y in the future. They can just reuse the (performance optimized)
bus and models, and only need to build the connector to bridge them.
That is exactly the design goal of virtio (except it limits itself toNo, virtio is only part of the picture. It not including the backend
virtualization).
models, or how to do memory/signal-path abstraction for in-kernel, for
instance. But otherwise, virtio as a device model is compatible with
vbus as a bus model. They compliment one another.
We can agree to disagree then, eh? There are certainly quantifiableThen device models like virtio can ride happily on top and we end upSorry, I don't think you've shown any quantifiable advantages.
with a really robust and high-performance Linux-based stack. I don't
buy the argument that we already have PCI so lets use it. I don't think
its the best design and I am not afraid to make an investment in a
change here because I think it will pay off in the long run.
differences. Waving your hand at the differences to say they are not
advantages is merely an opinion, one that is not shared universally.
The bottom line is all of these design distinctions are encapsulated
within the vbus subsystem and do not affect the kvm code-base. So
agreement with kvm upstream is not a requirement, but would be
advantageous for collaboration.