Re: [RFC PATCH v2 0/8] virtio/vsock: experimental zerocopy receive

From: Arseniy Krasnov
Date: Tue Jun 14 2022 - 00:23:44 EST


On 13.06.2022 11:54, Stefano Garzarella wrote:
> On Thu, Jun 09, 2022 at 12:33:32PM +0000, Arseniy Krasnov wrote:
>> On 09.06.2022 11:54, Stefano Garzarella wrote:
>>> Hi Arseniy,
>>> I left some comments in the patches, and I'm adding something also here:
>> Thanks for comments
>>>
>>> On Fri, Jun 03, 2022 at 05:27:56AM +0000, Arseniy Krasnov wrote:
>>>>                              INTRODUCTION
>>>>
>>>>     Hello, this is experimental implementation of virtio vsock zerocopy
>>>> receive. It was inspired by TCP zerocopy receive by Eric Dumazet. This API uses
>>>> same idea: call 'mmap()' on socket's descriptor, then every 'getsockopt()' will
>>>> fill provided vma area with pages of virtio RX buffers. After received data was
>>>> processed by user, pages must be freed by 'madvise()'  call with MADV_DONTNEED
>>>> flag set(if user won't call 'madvise()', next 'getsockopt()' will fail).
>>>
>>> If it is not too time-consuming, can we have a table/list to compare this and the TCP zerocopy?
>> You mean compare API with more details?
>
> Yes, maybe a comparison from the user's point of view to do zero-copy with TCP and VSOCK.
>
>>>
>>>>
>>>>                                 DETAILS
>>>>
>>>>     Here is how mapping with mapped pages looks exactly: first page mapping
>>>> contains array of trimmed virtio vsock packet headers (in contains only length
>>>> of data on the corresponding page and 'flags' field):
>>>>
>>>>     struct virtio_vsock_usr_hdr {
>>>>         uint32_t length;
>>>>         uint32_t flags;
>>>>         uint32_t copy_len;
>>>>     };
>>>>
>>>> Field  'length' allows user to know exact size of payload within each sequence
>>>> of pages and 'flags' allows user to handle SOCK_SEQPACKET flags(such as message
>>>> bounds or record bounds). Field 'copy_len' is described below in 'v1->v2' part.
>>>> All other pages are data pages from RX queue.
>>>>
>>>>             Page 0      Page 1      Page N
>>>>
>>>>     [ hdr1 .. hdrN ][ data ] .. [ data ]
>>>>           |        |       ^           ^
>>>>           |        |       |           |
>>>>           |        *-------------------*
>>>>           |                |
>>>>           |                |
>>>>           *----------------*
>>>>
>>>>     Of course, single header could represent array of pages (when packet's
>>>> buffer is bigger than one page).So here is example of detailed mapping layout
>>>> for some set of packages. Lets consider that we have the following sequence  of
>>>> packages: 56 bytes, 4096 bytes and 8200 bytes. All pages: 0,1,2,3,4 and 5 will
>>>> be inserted to user's vma(vma is large enough).
>>>
>>> In order to have a "userspace polling-friendly approach" and reduce number of syscall, can we allow for example the userspace to mmap at least the first header before packets arrive.
>>> Then the userspace can poll a flag or other fields in the header to understand that there are new packets.
>> You mean to avoid 'poll()' syscall, user will spin on some flag, provided by kernel on some mapped page? I think yes. This is ok. Also i think, that i can avoid 'madvise' call
>> to clear memory mapping before each 'getsockopt()' - let 'getsockopt()' do 'madvise()' job by removing pages from previous data. In this case only one system call is needed - 'getsockopt()'.
>
> Yes, that's right. I mean to support both, poll() for interrupt-based applications and the ability to actively poll a variable in the shared memory for applications that want to minimize latency.
I see, in this case seems 'vsock_sock' will maintain list of such shared pages, to update every page when new data is available. And sometimes check that mapping was removed
by user(because we don't have munmap callback in 'proto_ops', mmap only), for example using ref counter for such shared page.

Thanks
>
> Thanks,
> Stefano
>