Re: [RFC PATCH v5 00/19] virtio/vsock: introduce SOCK_SEQPACKET support
From: Arseny Krasnov
Date: Tue Feb 23 2021 - 23:31:34 EST
On 23.02.2021 17:50, Stefano Garzarella wrote:
> On Mon, Feb 22, 2021 at 03:23:11PM +0100, Stefano Garzarella wrote:
>> Hi Arseny,
>>
>> On Thu, Feb 18, 2021 at 08:33:44AM +0300, Arseny Krasnov wrote:
>>> This patchset impelements support of SOCK_SEQPACKET for virtio
>>> transport.
>>> As SOCK_SEQPACKET guarantees to save record boundaries, so to
>>> do it, two new packet operations were added: first for start of record
>>> and second to mark end of record(SEQ_BEGIN and SEQ_END later). Also,
>>> both operations carries metadata - to maintain boundaries and payload
>>> integrity. Metadata is introduced by adding special header with two
>>> fields - message count and message length:
>>>
>>> struct virtio_vsock_seq_hdr {
>>> __le32 msg_cnt;
>>> __le32 msg_len;
>>> } __attribute__((packed));
>>>
>>> This header is transmitted as payload of SEQ_BEGIN and SEQ_END
>>> packets(buffer of second virtio descriptor in chain) in the same way as
>>> data transmitted in RW packets. Payload was chosen as buffer for this
>>> header to avoid touching first virtio buffer which carries header of
>>> packet, because someone could check that size of this buffer is equal
>>> to size of packet header. To send record, packet with start marker is
>>> sent first(it's header contains length of record and counter), then
>>> counter is incremented and all data is sent as usual 'RW' packets and
>>> finally SEQ_END is sent(it also carries counter of message, which is
>>> counter of SEQ_BEGIN + 1), also after sedning SEQ_END counter is
>>> incremented again. On receiver's side, length of record is known from
>>> packet with start record marker. To check that no packets were dropped
>>> by transport, counters of two sequential SEQ_BEGIN and SEQ_END are
>>> checked(counter of SEQ_END must be bigger that counter of SEQ_BEGIN by
>>> 1) and length of data between two markers is compared to length in
>>> SEQ_BEGIN header.
>>> Now as packets of one socket are not reordered neither on
>>> vsock nor on vhost transport layers, such markers allows to restore
>>> original record on receiver's side. If user's buffer is smaller that
>>> record length, when all out of size data is dropped.
>>> Maximum length of datagram is not limited as in stream socket,
>>> because same credit logic is used. Difference with stream socket is
>>> that user is not woken up until whole record is received or error
>>> occurred. Implementation also supports 'MSG_EOR' and 'MSG_TRUNC' flags.
>>> Tests also implemented.
>> I reviewed the first part (af_vsock.c changes), tomorrow I'll review
>> the rest. That part looks great to me, only found a few minor issues.
> I revieiwed the rest of it as well, left a few minor comments, but I
> think we're well on track.
>
> I'll take a better look at the specification patch tomorrow.
Great, Thank You
>
> Thanks,
> Stefano
>
>> In the meantime, however, I'm getting a doubt, especially with regard
>> to other transports besides virtio.
>>
>> Should we hide the begin/end marker sending in the transport?
>>
>> I mean, should the transport just provide a seqpacket_enqueue()
>> callbacl?
>> Inside it then the transport will send the markers. This is because
>> some transports might not need to send markers.
>>
>> But thinking about it more, they could actually implement stubs for
>> that calls, if they don't need to send markers.
>>
>> So I think for now it's fine since it allows us to reuse a lot of
>> code, unless someone has some objection.
I thought about that, I'll try to implement it in next version. Let's see...
>>
>> Thanks,
>> Stefano
>>
>