Re: [PATCH net-next] ibmveth: v1 calculate correct gso_size and set gso_type

From: Jonathan Maxwell
Date: Sat Oct 29 2016 - 20:22:37 EST


On Fri, Oct 28, 2016 at 5:08 AM, Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote:
> On Thu, 2016-10-27 at 12:54 -0500, Thomas Falcon wrote:
>> On 10/27/2016 10:26 AM, Eric Dumazet wrote:
>> > On Wed, 2016-10-26 at 11:09 +1100, Jon Maxwell wrote:
>> >> We recently encountered a bug where a few customers using ibmveth on the
>> >> same LPAR hit an issue where a TCP session hung when large receive was
>> >> enabled. Closer analysis revealed that the session was stuck because the
>> >> one side was advertising a zero window repeatedly.
>> >>
>> >> We narrowed this down to the fact the ibmveth driver did not set gso_size
>> >> which is translated by TCP into the MSS later up the stack. The MSS is
>> >> used to calculate the TCP window size and as that was abnormally large,
>> >> it was calculating a zero window, even although the sockets receive buffer
>> >> was completely empty.
>> >>
>> >> We were able to reproduce this and worked with IBM to fix this. Thanks Tom
>> >> and Marcelo for all your help and review on this.
>> >>
>> >> The patch fixes both our internal reproduction tests and our customers tests.
>> >>
>> >> Signed-off-by: Jon Maxwell <jmaxwell37@xxxxxxxxx>
>> >> ---
>> >> drivers/net/ethernet/ibm/ibmveth.c | 20 ++++++++++++++++++++
>> >> 1 file changed, 20 insertions(+)
>> >>
>> >> diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
>> >> index 29c05d0..c51717e 100644
>> >> --- a/drivers/net/ethernet/ibm/ibmveth.c
>> >> +++ b/drivers/net/ethernet/ibm/ibmveth.c
>> >> @@ -1182,6 +1182,8 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
>> >> int frames_processed = 0;
>> >> unsigned long lpar_rc;
>> >> struct iphdr *iph;
>> >> + bool large_packet = 0;
>> >> + u16 hdr_len = ETH_HLEN + sizeof(struct tcphdr);
>> >>
>> >> restart_poll:
>> >> while (frames_processed < budget) {
>> >> @@ -1236,10 +1238,28 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
>> >> iph->check = 0;
>> >> iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl);
>> >> adapter->rx_large_packets++;
>> >> + large_packet = 1;
>> >> }
>> >> }
>> >> }
>> >>
>> >> + if (skb->len > netdev->mtu) {
>> >> + iph = (struct iphdr *)skb->data;
>> >> + if (be16_to_cpu(skb->protocol) == ETH_P_IP &&
>> >> + iph->protocol == IPPROTO_TCP) {
>> >> + hdr_len += sizeof(struct iphdr);
>> >> + skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
>> >> + skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
>> >> + } else if (be16_to_cpu(skb->protocol) == ETH_P_IPV6 &&
>> >> + iph->protocol == IPPROTO_TCP) {
>> >> + hdr_len += sizeof(struct ipv6hdr);
>> >> + skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
>> >> + skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
>> >> + }
>> >> + if (!large_packet)
>> >> + adapter->rx_large_packets++;
>> >> + }
>> >> +
>> >>
>> > This might break forwarding and PMTU discovery.
>> >
>> > You force gso_size to device mtu, regardless of real MSS used by the TCP
>> > sender.
>> >
>> > Don't you have the MSS provided in RX descriptor, instead of guessing
>> > the value ?
>> >
>> >
>> >
>> The MSS is not always available unfortunately, so this is the best solution there is at the moment.
>
> Hmm... then what about skb_shinfo(skb)->gso_segs ?
>
> ip_rcv() for example has :
>
> __IP_ADD_STATS(net,
> IPSTATS_MIB_NOECTPKTS + (iph->tos & INET_ECN_MASK),
> max_t(unsigned short, 1, skb_shinfo(skb)->gso_segs));
>
>

Okay thanks Eric. As the MSS is the main concern, let me work with the
team here and see if we find can a better way to do this. Will take care of
the other points you raised at the same time.

>
> Also prefer : (skb->protocol == htons(ETH_P_IP)) tests
>
> And the ipv6 test is wrong :
>
> } else if (be16_to_cpu(skb->protocol) == ETH_P_IPV6 &&
> iph->protocol == IPPROTO_TCP) {
>
>
> Since iph is a pointer to ipv4 iphdr .
>
>
>