Re: [PATCH v5] virtio/vsock: replace virtio_vsock_pkt with sk_buff

From: Bobby Eshleman
Date: Wed Dec 07 2022 - 13:08:43 EST

Next message: Vishal Moola (Oracle): "[PATCH mm-unstable v2] ext4: Convert mext_page_double_lock() to mext_folio_double_lock()"
Previous message: Michał Cłapiński: "Re: [PATCH 1/2] sched/membarrier: Introduce MEMBARRIER_CMD_GET_REGISTRATIONS"
In reply to: Paolo Abeni: "Re: [PATCH v5] virtio/vsock: replace virtio_vsock_pkt with sk_buff"
Next in thread: Michael S. Tsirkin: "Re: [PATCH v5] virtio/vsock: replace virtio_vsock_pkt with sk_buff"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, Dec 07, 2022 at 10:33:22AM +0100, Paolo Abeni wrote:
> On Mon, 2022-11-21 at 12:01 +0000, Bobby Eshleman wrote:
> > On Tue, Dec 06, 2022 at 11:20:21AM +0100, Paolo Abeni wrote:
> > > Hello,
> > >
> > > On Fri, 2022-12-02 at 09:35 -0800, Bobby Eshleman wrote:
> > > [...]
> > > > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > > > index 35d7eedb5e8e..6c0b2d4da3fe 100644
> > > > --- a/include/linux/virtio_vsock.h
> > > > +++ b/include/linux/virtio_vsock.h
> > > > @@ -3,10 +3,129 @@
> > > > #define _LINUX_VIRTIO_VSOCK_H
> > > >
> > > > #include <uapi/linux/virtio_vsock.h>
> > > > +#include <linux/bits.h>
> > > > #include <linux/socket.h>
> > > > #include <net/sock.h>
> > > > #include <net/af_vsock.h>
> > > >
> > > > +#define VIRTIO_VSOCK_SKB_HEADROOM (sizeof(struct virtio_vsock_hdr))
> > > > +
> > > > +enum virtio_vsock_skb_flags {
> > > > + VIRTIO_VSOCK_SKB_FLAGS_REPLY = BIT(0),
> > > > + VIRTIO_VSOCK_SKB_FLAGS_TAP_DELIVERED = BIT(1),
> > > > +};
> > > > +
> > > > +static inline struct virtio_vsock_hdr *virtio_vsock_hdr(struct sk_buff *skb)
> > > > +{
> > > > + return (struct virtio_vsock_hdr *)skb->head;
> > > > +}
> > > > +
> > > > +static inline bool virtio_vsock_skb_reply(struct sk_buff *skb)
> > > > +{
> > > > + return skb->_skb_refdst & VIRTIO_VSOCK_SKB_FLAGS_REPLY;
> > > > +}
> > >
> > > I'm sorry for the late feedback. The above is extremelly risky: if the
> > > skb will land later into the networking stack, we could experience the
> > > most difficult to track bugs.
> > >
> > > You should use the skb control buffer instead (skb->cb), with the
> > > additional benefit you could use e.g. bool - the compiler could emit
> > > better code to manipulate such fields - and you will not need to clear
> > > the field before release nor enqueue.
> > >
> > > [...]
> > >
> >
> > Hey Paolo, thank you for the review. For my own learning, this would
> > happen presumably when the skb is dropped? And I assume we don't see
> > this in sockmap because it is always cleared before leaving sockmap's
> > hands? I sanity checked this patch with an out-of-tree patch I have that
> > uses the networking stack, but I suspect I didn't see issues because my
> > test harness didn't induce dropping...
>
> skb->_skb_refdst carries a dst and a flag in the less significative bit
> specifying if the dst is refcounted. Passing to the network stack a skb
> overloading such bit semanthic is quite alike intentionally corrupting
> the kernel memory.
>
> > I originally avoided skb->cb because the reply flag is set at allocation
> > and would potentially be clobbered by a pass through the networking
> > stack. The reply flag would be used after a pass through the networking
> > stack (e.g., during transmission at the device level and when sockets
> > close while skbs are still queued for xmit).
>
> I assumed the 'tap_delivered' and 'reply' flag where relevant only
> while the skb is owned by the virtio socket. If you need to preserve
> such information _after_ delivering the skb to the network stack, that
> is quite unfortunate - and skb->cb will not work.
>
> The are a couple of options for adding new metadata inside the skb,
> both of them are quite discouraged/need a strong use-case:
>
> - adding new fields in some skb hole
> - adding a new skb extension.
>
> Could you please describe the 'reply' and 'tap_delivered' life-cycle
> and their interaction with the network stack?
>

vsock has two layers: the socket layer and the transport layer.

'tap_delivered' is set and used only in the transport layer, upon
transmission, so I think skb->cb can easily be used for it.

'reply' is set only for some types of packets like reset and response
packets. Sometimes this is originating from the socket layer, and
sometimes from the transport. They share the same sending function.

The reply flag is later used by the transport as a hint to queue a
worker to process more RX packets.

After looking at the reply flag closer, I think it can be derived from
the packet header. If derived like so, this problem should go away.

Given that the current implementation does not use the networking stack,
I suggest that we move both flags to skb->cb for this patch, to avoid
any possible future confusion. If future patches change vsock to use the
networking stack, then they should also include the changes to handle
these flags appropriately.

Thanks,
Bobby

Next message: Vishal Moola (Oracle): "[PATCH mm-unstable v2] ext4: Convert mext_page_double_lock() to mext_folio_double_lock()"
Previous message: Michał Cłapiński: "Re: [PATCH 1/2] sched/membarrier: Introduce MEMBARRIER_CMD_GET_REGISTRATIONS"
In reply to: Paolo Abeni: "Re: [PATCH v5] virtio/vsock: replace virtio_vsock_pkt with sk_buff"
Next in thread: Michael S. Tsirkin: "Re: [PATCH v5] virtio/vsock: replace virtio_vsock_pkt with sk_buff"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]