Re: [PATCH 4/4] tun: indicate support for USO feature

From: Yuri Benditovich
Date: Thu May 13 2021 - 04:15:20 EST


On Thu, May 13, 2021 at 10:05 AM Jason Wang <jasowang@xxxxxxxxxx> wrote:
>
> On Thu, May 13, 2021 at 12:36 PM Yuri Benditovich
> <yuri.benditovich@xxxxxxxxxx> wrote:
> >
> > On Thu, May 13, 2021 at 5:07 AM Jason Wang <jasowang@xxxxxxxxxx> wrote:
> > >
> > > On Wed, May 12, 2021 at 6:37 PM Yuri Benditovich
> > > <yuri.benditovich@xxxxxxxxxx> wrote:
> > > >
> > > > On Wed, May 12, 2021 at 12:10 PM Jason Wang <jasowang@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Wed, May 12, 2021 at 4:32 PM Yuri Benditovich
> > > > > <yuri.benditovich@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > On Wed, May 12, 2021 at 10:50 AM Jason Wang <jasowang@xxxxxxxxxx> wrote:
> > > > > > >
> > > > > > > On Wed, May 12, 2021 at 1:24 PM Yuri Benditovich
> > > > > > > <yuri.benditovich@xxxxxxxxxx> wrote:
> > > > > > > >
> > > > > > > > On Wed, May 12, 2021 at 4:33 AM Jason Wang <jasowang@xxxxxxxxxx> wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 在 2021/5/11 下午4:33, Yuri Benditovich 写道:
> > > > > > > > > > On Tue, May 11, 2021 at 9:50 AM Jason Wang <jasowang@xxxxxxxxxx> wrote:
> > > > > > > > > >>
> > > > > > > > > >> 在 2021/5/11 下午12:42, Yuri Benditovich 写道:
> > > > > > > > > >>> Signed-off-by: Yuri Benditovich <yuri.benditovich@xxxxxxxxxx>
> > > > > > > > > >>> ---
> > > > > > > > > >>> drivers/net/tun.c | 2 +-
> > > > > > > > > >>> 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > > >>>
> > > > > > > > > >>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > > > > > > > > >>> index 84f832806313..a35054f9d941 100644
> > > > > > > > > >>> --- a/drivers/net/tun.c
> > > > > > > > > >>> +++ b/drivers/net/tun.c
> > > > > > > > > >>> @@ -2812,7 +2812,7 @@ static int set_offload(struct tun_struct *tun, unsigned long arg)
> > > > > > > > > >>> arg &= ~(TUN_F_TSO4|TUN_F_TSO6);
> > > > > > > > > >>> }
> > > > > > > > > >>>
> > > > > > > > > >>> - arg &= ~TUN_F_UFO;
> > > > > > > > > >>> + arg &= ~(TUN_F_UFO|TUN_F_USO);
> > > > > > > > > >>
> > > > > > > > > >> It looks to me kernel doesn't use "USO", so TUN_F_UDP_GSO_L4 is a better
> > > > > > > > > >> name for this
> > > > > > > > > > No problem, I can change it in v2
> > > > > > > > > >
> > > > > > > > > > and I guess we should toggle NETIF_F_UDP_GSO_l4 here?
> > > > > > > > > >
> > > > > > > > > > No, we do not, because this indicates only the fact that the guest can
> > > > > > > > > > send large UDP packets and have them splitted to UDP segments.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Actually the reverse. The set_offload() controls the tuntap TX path
> > > > > > > > > (guest RX path).
> > > > > > > >
> > > > > > > > The set_offloads does 2 things:
> > > > > > > > 1. At the initialization time qemu probes set_offload(something) to
> > > > > > > > check which features are supported by TAP/TUN.
> > > > > > >
> > > > > > > Note that the probing is used for guest RX features not host RX.
> > > > > >
> > > > > > It looks like the hidden assumption (till now) is that if some feature
> > > > > > is present - it exists simultaneously for host and guest.
> > > > > > See QEMU get_features: if the TAP/TUN does not have UFO both HOST and
> > > > > > GUEST FEATURES are cleared.
> > > > >
> > > > > Kind of, actually the assumption is: if a guest feature
> > > > > (VIRTIO_NET_F_GUEST_XXX) is support, the corresponding host feature
> > > > > (VIRTIO_NET_F_HOST_XXX) is also supported.
> > > >
> > > > So nothing tells us that the TUNSETOFFLOAD is going to set GUEST offloads.
> > > > From if_tun.h
> > > > #define TUN_F_CSUM 0x01 /* You can hand me unchecksummed packets. */
> > > > #define TUN_F_TSO4 0x02 /* I can handle TSO for IPv4 packets */
> > > > #define TUN_F_TSO6 0x04 /* I can handle TSO for IPv6 packets */
> > > > #define TUN_F_TSO_ECN 0x08 /* I can handle TSO with ECN bits. */
> > > > #define TUN_F_UFO 0x10 /* I can handle UFO packets */
> > >
> > > Yes, that's why I replied in another thread to say that there's no way
> > > to refuse GSO packets from userspace, even if TUN_F_XXX is not set via
> > > tun_set_offload().
> > >
> > > E.g you can disable sending GSO packets to guests but you can't reject
> > > GSO packets from guest/userspace.
> >
> > We agree here.
> > Sorry for being unclear. I meant following:
> > According to the comment the TUN_F_CSUM is a _host_ capability.
> > According to the comment the TUN_F_UFO is a _guest_ capability.
> >
> > But surprisingly when TUN receives TUN_F_UFO it does not propagate it
> > anywhere, there is no corresponding NETIF flag.
>
> (It looks like I drop the community and other ccs accidentally, adding
> them back and sorry)
I thought you did it intentionally to avoid the flame
>
> Actually, there is one, NETIF_F_GSO_UDP.
>
> Kernel used to have NETIF_F_UFO, but it was removed due to bugs and
> the lack of real hardware support. Then we found it breaks uABI, so
> Willem tries to make it appear for userspace again, and then it was
> renamed to NETIF_F_GSO_UDP.
>
> But I think it's a bug that we don't proporate TUN_F_UFO to NETIF
> flag, this is a must for the driver that doesn't support
> VIRTIO_NET_F_GUEST_UFO. I just try to disable all offloads and
> mrg_rxbuf, then netperf UDP_STREAM from host to guest gives me bad
> length packet in the guest.
>
> Willem, I think we probably need to fix this.
>
>
> > So in fact TUN_F_UFO is processed by the TUN/TAP exactly as a host capability.
> >
> > >
> > > >
> > > > So, let's write
> > > >
> > > > #define TUN_F_UDP_L4TX 0x20 /* You can send me large UDP packets */
> > >
> > > So if we stick to the assumption "if a guest feature is supported, the
> > > corresponding host feature is supported". There's no need for this.
> > > And I think it's the most clean way.
> >
> > My personal opinion is that it is extremely wrong to extend such an
> > unobvious assumption to each new feature.
>
> This results in inconsistency with other GSO/CSUM flags. And will
> complicate the uAPI (two flags, one for RX another for TX).
>
> Considering the current code works for many years, it's not worth
> bothering I think.
>
> >
> > >
> > > > #define TUN_F_UDP4_L4RX 0x40 /* I can coalesce UDPv4 segments */
> > > > #define TUN_F_UDP6_L4RX 0x80 /* I can coalesce UDPv6 segments */
> > >
> > > Any value to coalesce UDP segments here? It's better to do it in the
> > > TX source (guest).
> >
> > Coalescing is a consent of the guest to receive packets bigger than MTU.
> > Otherwise (if the guest does not agree) the host must segment/fragment
> > packets before transmitting them to the guest.
>
> This looks like a different feature which is not necessarily known by guests?
>
> Kernel supports GRO which can coalesce packets. (It was not supported
> by TAP yet though).
If I understand things correctly this is exactly this feature:
The guest transmits a large UDP packet with the GSO value that means
that the host should segment it _if needed_.
If the destination (for example another guest) is not able to receive
the original large packet it is segmented and the segments pushed to
that guest.
If the destination can receive the original large packet (currently
not) it is just pushed to it as if it was segmented and then
coalesced.
As an example of the same with TCP:
With the current kernel Windows guest receives coalesced packets (for
example when segmented packets come via physical adapter with
coalescing capability) due to the fact that it dynamically enables
VIRTIO_NET_F_GUEST_TSO via VIRTIO_NET_CTRL_GUEST_OFFLOADS which
finally sets NETIF_F_TSO.
>
>
> > It is not related to guest TX.
> >
> > For example, Windows guest is not able to handle large UDP packets
> > (this is not supported by the stack yet).
>
> In this case, the corresponding guest or host features will be
> disabled, and the kernel won't send those kinds of GSO packets to
> guests.
>
> >
> > >
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > > 2. Later it configures the guest RX path according to guest's needs/capabilities
> > > > > > > > Typical initialization sequence is (in case the QEMU supports USO feature):
> > > > > > >
> > > > > > > It also depends on whether the backend(TAP) has the support for guest RX.
> > > > > >
> > > > > > In the code of TAP and TUN I do not see any "if the backend has the
> > > > > > support for guest RX".
> > > > >
> > > > > Yes, the detection is implied as you described above.
> > > > >
> > > > > > This is just the IOCTL and set of TUN_F_* bits. Their meaning is
> > > > > > defined in the comments.
> > > > > >
> > > > > > >
> > > > > > > > TAP/TUN set offload 11 (probe for UFO support)
> > > > > > > > TAP/TUN set offload 21 (probe for USO support)
> > > > > > > > TAP/TUN set offload 0
> > > > > > > > ...
> > > > > > > > TAP/TUN set offload 7 (configuration of offloads according to GUEST features)
> > > > > > > >
> > > > > > > > This series of patches is for VIRTIO_NET_F_HOST_USO only, virtio-net
> > > > > > > > features like VIRTIO_NET_F_GUEST_USO_(4/6/whatever) are not defined in
> > > > > > > > the spec yet.
> > > > > > > >
> > > > > > >
> > > > > > > I'm a little bit confused here. Consider you want to implement guest
> > > > > > > TX so there's no need for any modification on the set_offload().
> > > > > >
> > > > > > I do not think so. Please correct me if I'm mistaken:
> > > > > > QEMU needs to indicate the HOST_USO feature (or not indicate).
> > > > > > How can QEMU know the kernel is able to support VIRTIO_NET_HDR_GSO_UDP_L4?
> > > > >
> > > > > Ok, I finally get you idea. Thanks for the patience.
> > > > >
> > > > > But still one issue: Assume we implement VIRTIO_NET_F_HOST_USO. How
> > > > > could we add VIRTIO_NET_F_GUEST_USO in the future? Adding another TUN
> > > > > flag for set_offload()? Seems unnecessary and inconsistency with
> > > > > current TUN flags.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > I think we need to implement both directions at one time as what has
> > > > > > > been partially done in this series:
> > > > > > >
> > > > > >
> > > > > > You actually suggest that we need to start from Linux virtio-net
> > > > > > driver and implement on it both TX and RX.
> > > > > > Our main area is virtio-win drivers and all the rest we do when we can.
> > > > > > Currently we have 2 WIP tasks related to Linux (virtio-net RSS and
> > > > > > libvirt RSS/eBPF) and (my feeling) we hardly can start with additional
> > > > > > one.
> > > > >
> > > > > I can help for the linux driver if you wish.
> > > >
> > > > I understand. Probably I've made a mistake from the beginning:
> > > > At first stage I've prepared the spec change of what we need in hope
> > > > that this will be fast.
> > > > Probably the better way was to prepare RFC patches first then start
> > > > changing the spec.
> > > >
> > > > So the question is what to do now:
> > > > A)
> > > > Finalize patches for guest TX and respective QEMU patches
> > > > Prepare RFC patches for guest RX, get ack on them
> > > > Change the spec
> > > > Finalize patches for guest RX according to the spec
> > > >
> > > > B)
> > > > Reject the patches for guest TX
> > > > Prepare RFC patches for everything, get ack on them
> > > > Change the spec
> > > > Finalize patches for everything according to the spec
> > > >
> > > > I'm for A) of course :)
> > >
> > > I'm for B :)
> > >
> > > The reasons are:
> > >
> > > 1) keep the assumption of tun_set_offload() to simply the logic and
> > > compatibility
> > > 2) it's hard or tricky to touch guest TX path only (e.g the
> > > virtio_net_hdr_from_skb() is called in both RX and TX)
> >
> > I suspect there is _some_ misunderstanding here.
> > I did not touch virtio_net_hdr_from_skb at all.
> >
>
> Typo, actually I meant virtio_net_hdr_to_skb().
OK.
2) tun_get_user() which is guest TX - this is covered
3) tap_get_user() which is guest TX - this is covered
4) {t}packet_send() which is userspace TX - this is OK, the userspace
does not have this feature, it will never use USO

1) receive_buf() which is Linux guest RX - this is interesting
Do you mean that with my patches if Windows VM sends a packet with USO
- the Linux VM will not receive it correctly segmented?
When I send packets with USO via TUN I receive them segmented on
another TUN (2 Windows adapters).


>
> Thanks
>
> > >
> > > Thanks
> > >
> > > >
> > > > >
> > > > > Thanks
> > > > >
> > > > > > This is a reason why I've added to the virtio spec only HOST_USO and
> > > > > > not GUEST_USO4/6.
> > > > > > UDP RSC (which is actually guest rx USO) is not available on Windows
> > > > > > at the moment.
> > > > > >
> > > > > > > 1) set_offload() is for guest RX.
> > > > > > > 2) virtio_net_hdr_to_skb() is for both guest TX and guest RX.
> > > > > > >
> > > > > > > For testing, you can run VM2VM on the same host, and you will get
> > > > > > > everything tested.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > > >
> > > > > > > > > When VIRTIO_NET_F_GUEST_XXX was not negotiated, the corresponding netdev
> > > > > > > > > features needs to be disabled. When host tries to send those packets to
> > > > > > > > > guest, it needs to do software segmentation.
> > > > > > > > >
> > > > > > > > > See virtio_net_apply_guest_offloads().
> > > > > > > > >
> > > > > > > > > There's currently no way (or not need) to prevent tuntap from receiving
> > > > > > > > > GSO packets.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >> And how about macvtap?
> > > > > > > > > > We will check how to do that for macvtap. We will send a separate
> > > > > > > > > > patch for macvtap or ask for advice.
> > > > > > > > > >
> > > > > > > > > >> Thanks
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>> }
> > > > > > > > > >>>
> > > > > > > > > >>> /* This gives the user a way to test for new features in future by
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>