Re: [RFC PATCH 00/17] virtual-bus

From: Gregory Haskins
Date: Fri Apr 03 2009 - 09:12:02 EST


Avi Kivity wrote:
> Gregory Haskins wrote:
>> So again, I am proposing for consideration of accepting my work (either
>> in its current form, or something we agree on after the normal review
>> process) not only on the basis of the future development of the
>> platform, but also to keep current components in their running to their
>> full potential. I will again point out that the code is almost
>> completely off to the side, can be completely disabled with config
>> options, and I will maintain it. Therefore the only real impact is to
>> people who care to even try it, and to me.
>>
>
> Your work is a whole stack. Let's look at the constituents.
>
> - a new virtual bus for enumerating devices.
>
> Sorry, I still don't see the point. It will just make writing drivers
> more difficult. The only advantage I've heard from you is that it
> gets rid of the gunk. Well, we still have to support the gunk for
> non-pv devices so the gunk is basically free. The clean version is
> expensive since we need to port it to all guests and implement
> exciting features like hotplug.
My real objection to PCI is fast-path related. I don't object, per se,
to using PCI for discovery and hotplug. If you use PCI just for these
types of things, but then allow fastpath to use more hypercall oriented
primitives, then I would agree with you. We can leave PCI emulation in
user-space, and we get it for free, and things are relatively tidy.

Its once you start requiring that we stay ABI compatible with something
like the existing virtio-net in x86 KVM where I think it starts to get
ugly when you try to move it into the kernel. So that is what I had a
real objection to. I think as long as we are not talking about trying
to make something like that work, its a much more viable prospect.

So what I propose is the following:

1) The core vbus design stays the same (or close to it)
2) the vbus-proxy and kvm-guest patch go away
3) the kvm-host patch changes to work with coordination from the
userspace-pci emulation for things like MSI routing
4) qemu will know to create some MSI shim 1:1 with whatever it
instantiates on the bus (and can communicate changes
5) any drivers that are written for these new PCI-IDs that might be
present are allowed to use a hypercall ABI to talk after they have been
probed for that ID (e.g. they are not limited to PIO or MMIO BAR type
access methods).

Once I get here, I might have greater clarity to see how hard it would
make to emulate fast path components as well. It might be easier than I
think.

This is all off the cuff so it might need some fine tuning before its
actually workable.

Does that sound reasonable?

>
> - finer-grained point-to-point communication abstractions
>
> Where virtio has ring+signalling together, you layer the two. For
> networking, it doesn't matter. For other applications, it may be
> helpful, perhaps you have something in mind.

Yeah, actually. Thanks for bringing that up.

So the reason why signaling and the ring are distinct constructs in the
design is to facilitate constructs other than rings. For instance,
there may be some models where having a flat shared page is better than
a ring. A ring will naturally preserve all values in flight, where as a
flat shared page would not (last update is always current). There are
some algorithms where a previously posted value is obsoleted by an
update, and therefore rings are inherently bad for this update model.
And as we know, there are plenty of algorithms where a ring works
perfectly. So I wanted that flexibility to be able to express both.

One of the things I have in mind for the flat page model is that RT vcpu
priority thing. Another thing I am thinking of is coming up with a PV
LAPIC type replacement (where we can avoid doing the EOI trap by having
the PICs state shared).

>
> - your "bidirectional napi" model for the network device
>
> virtio implements exactly the same thing, except for the case of tx
> mitigation, due to my (perhaps pig-headed) rejection of doing things
> in a separate thread, and due to the total lack of sane APIs for
> packet traffic.

Yeah, and this part is not vbus, nor in-kernel specific. That was just
a design element of venet-tap. Though note, I did design the
vbus/shm-signal infrastructure with rich support for such a notion in
mind, so it wasn't accidental or anything like that.

>
> - a kernel implementation of the host networking device
>
> Given the continuous rejection (or rather, their continuous
> non-adoption-and-implementation) of my ideas re zerocopy networking
> aio, that seems like a pragmatic approach. I wish it were otherwise.

Well, that gives me hope, at least ;)


>
> - a promise of more wonderful things yet to come
>
> Obviously I can't evaluate this.

Right, sorry. I wish I had more concrete examples to show you, but we
only have the venet-tap working at this time. I was going for the
"release early/often" approach in getting the core reviewed before we
got too far down a path, but perhaps that was the wrong thing in this
case. We will certainly be sending updates as we get some of the more
advanced models and concepts working.

-Greg

Attachment: signature.asc
Description: OpenPGP digital signature