Re: [PATCH] vsock: only load vmci transport on VMware hypervisor by default

From: Stefan Hajnoczi
Date: Fri Aug 18 2017 - 11:37:35 EST


On Fri, Aug 18, 2017 at 03:07:30AM +0000, Dexuan Cui wrote:
> > From: Jorgen S. Hansen [mailto:jhansen@xxxxxxxxxx]
> > Sent: Thursday, August 17, 2017 08:17
> > >
> > > Putting aside nested virtualization, I want to load the transport (vmci,
> > > Hyper-V, vsock) for which there is paravirtualized hardware present
> > > inside the guest.
> >
> > Good points. Completely agree that this is the desired behavior for a guest.
> >
> >
> > > It's a little tricker on the host side (doesn't matter for Hyper-V and
> > > probably also doesn't for VMware) because the host-side driver is a
> > > software device with no hardware backing it. In KVM we assume the
> > > vhost_vsock.ko kernel module will be loaded sufficiently early.
> >
> > Since the vmci driver is currently tied to PF_VSOCK it hasnât been a problem,
> > but on the host side the VMCI driver has no hardware backing it either, so
> > when we move to a more appropriate solution, this will be an issue for VMCI as
> > well. Iâll check our shipped products, but they most likely assume that if an
> > upstreamed vmci module is present, it will be loaded automatically.
>
> Hyper-V Sockets is a standard feature of VMBus v4.0, so we can easily know
> we can and should load iff vmbus_proto_version >= VERSION_WIN10.
>
> > > Things get trickier with nested virtualization because the VM might want
> > > to talk to its host but also to its nested VMs. The simple way of
> > > fixing this would be to allow two transports loaded simultaneously and
> > > route traffic destined to CID 2 to the host transport and all other
> > > traffic to the guest transport.
>
> This sounds like a little tricky to me.
> CID is not really used by us, because we only support guest<->host communication,
> and don't support guest<->guest communication. The Hyper-V host references
> every VM by VmID (which is invisible to the VM), and a VM can only talk to the
> host via this feature.

Applications running inside the guest should use VMADDR_CID_HOST (2) to
connect to the host, even on Hyper-V.

By the way, we should collaborate on a test suite and a vsock(7) man
page that documents the semantics of AF_VSOCK sockets. This way our
transports will have the same behavior and AF_VSOCK applications will
work on all 3 hypervisors.

Not all features need to be supported. For example, VMCI supports
SOCK_DGRAM while Hyper-V and virtio do not. But features that are
available should behave identically.

> > This is close to the routing the VMCI driver does in a nested environment, but
> > that is with the assumption that there is only one type of transport. Having two
> > different transports would require that we delay resolving the transport type
> > until the socket endpoint has been bound to an address. Things get trickier if
> > listening sockets use VMADDR_CID_ANY - if only one transport is present, this
> > would allow the socket to accept connections from both guests and outer host,
> > but with multiple transports that wonât work, since we canât associate a socket
> > with a transport until the socket is bound.
> >
> > >
> > > Perhaps we should discuss these cases a bit more to figure out how to
> > > avoid conflicts over MODULE_ALIAS_NETPROTO(PF_VSOCK).
> >
> > Agreed.
>
> Can we use the 'protocol' parameter in the socket() function:
> int socket(int domain, int type, int protocol)
>
> IMO currently the 'protocol' is not really used.
> I think we can modify __vsock_core_init() to allow multiple transport layers to
> be registered, and we can define different 'protocol' numbers for
> VMware/KVM/Hyper-V, and ask the application to explicitly specify what should
> be used. Considering compatibility, we can use the default transport in a given
> VM depending on the underlying hypervisor.

I think AF_VSOCK should hide the transport from users/applications.
Think of same-on-same nested virtualization: VMware-on-VMware or
KVM-on-KVM. In that case specifying VMCI or virtio doesn't help.

We'd still need to distinguish between "to guest" and "to host"
(currently VMCI has code to do this but virtio does not).

The natural place to distinguish the destination is when dealing with
the sockaddr in connect(), bind(), etc.

Stefan

Attachment: signature.asc
Description: PGP signature