Re: [PATCH] vsock: only load vmci transport on VMware hypervisor by default
From: Jorgen S. Hansen
Date: Tue Aug 22 2017 - 09:07:12 EST
> On Aug 22, 2017, at 11:54 AM, Stefan Hajnoczi <stefanha@xxxxxxxxxx> wrote:
>
> On Fri, Aug 18, 2017 at 11:07:37PM +0000, Dexuan Cui wrote:
>>> Not all features need to be supported. For example, VMCI supports
>>> SOCK_DGRAM while Hyper-V and virtio do not. But features that are
>>> available should behave identically.
>> I totally agree, though I'm afraid Hyper-V may have a little more limitations
>> compared to VMware/KVM duo to the <VM_ID, ServiceID> <--> <cid, port>
>> mapping.
>>
>>>> Can we use the 'protocol' parameter in the socket() function:
>>>> int socket(int domain, int type, int protocol)
>>>>
>>>> IMO currently the 'protocol' is not really used.
>>>> I think we can modify __vsock_core_init() to allow multiple transport layers
>>> to
>>>> be registered, and we can define different 'protocol' numbers for
>>>> VMware/KVM/Hyper-V, and ask the application to explicitly specify what
>>> should
>>>> be used. Considering compatibility, we can use the default transport in a
>>> given
>>>> VM depending on the underlying hypervisor.
>>>
>>> I think AF_VSOCK should hide the transport from users/applications.
>> Ideally yes, but let's consider the KVM-on-KVM nested scenario: when
>> an application in the Level-1 VM creates an AF_VSOCK socket and call
>> connect() for it, how can we know if the app is trying to connect to
>> the Level-0 host, or connect to the Level-2 VM? We can't.
>
> We *can* by looking at the destination CID. Please take a look at
> drivers/misc/vmw_vmci/vmci_route.c:vmci_route() to see how VMCI handles
> nested virt.
>
> It boils down to something like this:
>
> static int vsock_stream_connect(struct socket *sock, struct sockaddr *addr,
> int addr_len, int flags)
> {
> ...
> if (remote_addr.svm_cid == VMADDR_CID_HOST)
> transport = host_transport;
> else
> transport = guest_transport;
>
> It's easy for connect(2) but Jorgen mentioned it's harder for listen(2)
> because the socket would need to listen on both transports. We define
> two new constants VMADDR_CID_LISTEN_FROM_GUEST and
> VMADDR_CID_LISTEN_FROM_HOST for bind(2) so that applications can decide
> which side to listen on.
If a socket is bound to VMADDR_CID_HOST, we would consider that socket as bound to the host side transport, so that would be the same as VMADDR_CID_LISTEN_FROM_GUEST. For the guest, we have IOCTL_VM_SOCKETS_GET_LOCAL_CID, so that could be used to get and bind a socket to the guest transport (VMCI will always return the guest CID as the local one, if the VMCI driver is used in a guest, and it looks like virtio will do the same). We could treat VMADDR_CID_ANY as always being the guest transport, since that is the use case where you donât know upfront what your CID is, if we donât want to listen on all transports. So we would use the host transport, if a socket is bound to VMADDR_CID_HOST, or if there is no guest transport, and in all other cases use the guest transport. However, having a couple of symbolic names like you suggest certainly makes it more obvious, and could be used in combination with this. It would be a plus if existing applications would function as intended in most cases.
> Or the listen socket could simply listen to
> both sides.
The only problem here would be the potential for a guest and a host app to have a conflict wrt port numbers, even though they would be able to operate fine, if restricted to their appropriate transport.
Thanks,
Jorgen