Re: [RFC PATCH 00/17] virtual-bus

From: Gregory Haskins
Date: Thu Apr 02 2009 - 06:45:12 EST


Avi Kivity wrote:
> Gregory Haskins wrote:
>>
>>
>> I think there is a slight disconnect here. This is *exactly* what I am
>> trying to do. You can of course do this many ways, and I am not denying
>> it could be done a different way than the path I have chosen. One
>> extreme would be to just slam a virtio-net specific chunk of code
>> directly into kvm on the host. Another extreme would be to build a
>> generic framework into Linux for declaring arbitrary IO types,
>> integrating it with kvm (as well as other environments such as lguest,
>> userspace, etc), and building a virtio-net model on top of that.
>>
>> So in case it is not obvious at this point, I have gone with the latter
>> approach. I wanted to make sure it wasn't kvm specific or something
>> like pci specific so it had the broadest applicability to a range of
>> environments. So that is why the design is the way it is. I understand
>> that this approach is technically "harder/more-complex" than the "slam
>> virtio-net into kvm" approach, but I've already done that work. All we
>> need to do now is agree on the details ;)
>>
>>
>
> virtio is already non-kvm-specific (lguest uses it) and
> non-pci-specific (s390 uses it).

Ok, then to be more specific, I need it to be more generic than it
already is. For instance, I need it to be able to integrate with
shm_signals. If we can do that without breaking the existing ABI, that
would be great! Last I looked, it was somewhat entwined here so I didnt
try...but I admit that I didnt try that hard since I already had the IOQ
library ready to go.

>
>>> That said, I don't think we're bound today by the fact that we're in
>>> userspace.
>>>
>> You will *always* be bound by the fact that you are in userspace. Its
>> purely a question of "how much" and "does anyone care". Right now,
>> the anwer is "a lot (roughly 45x slower)" and "at least Greg's customers
>> do". I have no doubt that this can and will change/improve in the
>> future. But it will always be true that no matter how much userspace
>> improves, the kernel based solution will always be faster. Its simple
>> physics. I'm cutting out the middleman to ultimately reach the same
>> destination as the userspace path, so userspace can never be equal.
>>
>
> If you have a good exit mitigation scheme you can cut exits by a
> factor of 100; so the userspace exit costs are cut by the same
> factor. If you have good copyless networking APIs you can cut the
> cost of copies to zero (well, to the cost of get_user_pages_fast(),
> but a kernel solution needs that too).

"exit mitigation' schemes are for bandwidth, not latency. For latency
it all comes down to how fast you can signal in both directions. If
someone is going to do a stand-alone request-reply, its generally always
going to be at least one hypercall and one rx-interrupt. So your speed
will be governed by your signal path, not your buffer bandwidth.

What Ive done is shown that you can use techniques other than buffering
the head of the queue to do exit mitigation for bandwidth, while still
maintaining a very short signaling path for latency. And I also argue
that the latter will always be optimal in the kernel, though I know by
which degree is still TBD. Anthony thinks he can make the difference
negligible, and I would love to see it but am skeptical.

-Greg



Attachment: signature.asc
Description: OpenPGP digital signature