Re: TCP FIN-fragment failure in 2.2.x

Greg Beeley (gbeeley@ix.netcom.com)
Wed, 10 Mar 1999 11:12:19 +0000


Hello again....

Andrea Arcangeli wrote:
>
> On Tue, 9 Mar 1999, Greg Beeley wrote:
>
> >One solution is to modify tcp_output.c's function tcp_fragment in a way
> >somewhat like this:
> >
> > TCP_SKB_CB(skb)->end_seq -= nsize;
> > skb_trim(skb, skb->len - nsize);
> >
> > /* Fragmenting a packet with a FIN? Don't include the FIN's
> > * segment in the first packet!
> > */
> > if (flags & TCPCB_FLAG_FIN) {
> > TCP_SKB_CB(skb)->end_seq--;
> > }
>
> I think tcp_fragment is just doing the right thing and this patch looks
> bogus to me. But I could be wrong (as 10 minutes ago ;).
>
> And looking at your previous report it looks like flag & TCPCB_FLAG_FIN
> would never been true because from your previous report it seems that the
> loop starts before the first FIN has a way to be on the wire.

Ok -- yes the FIN never made its way to the wire because tcp_fragment
split the FIN packet (which had 500 some bytes of payload as well) into
two pieces, the first of which got stuck in the loop, and the second, which
had the FIN flag, never got transmitted.

Here's the detail, simplified a bit :)

1. 420 byte packet, start seq = 35297 end seq = 35718 (420 bytes + FIN)
2. Other end has 0 window, no xmit.
3. Other end advertises 102 byte window.
4. TCP fragment called to create a new 102 byte packet to send:

nsize = 420 - 102 = 318
(buff)->seq = 35297 + 102 = 35399
(buff)->end_seq = (skb)->end_seq = 35718
(skb)->end_seq -= nsize = 35718 - 318 = 35400 /* includes FIN!!!! */

5. Xmit packet len = 102 start seq = 35297, internal saved end_seq = 35400
6. Since end_seq never sent in packet, only len and start seq, remote end
ACK's 35399
7. Linux TCP says seq 35399 is before write queue head 35400, toss the ack.

> >That corrects this particular problem, but I'm not sure if there are
> >any other implications.
>
> So you mean that your change fixed the stall for you? Did you tried it?

Yes. I actually put a bunch of printk's in the kernel code that showed
something like this:

Mar 9 17:00:49 greg kernel: GRBTCP: checking packet pkt=3942089046
ack=3942089045

prior to the patch in tcp_input.c tcp_clean_rtx_queue inside the while loop
that cleans the write queue:

if (after(scb->end_seq, ack)) break;

The stall occurred with the scb->end_seq being one sequence number higher
than the ACK, even though at the tcpdump level all the numbers matched up
perfectly. It appeared to me that the stored end_seq value in the scb was
incorrect, so I chased it.

After the patch, the connection worked properly. I just tested it again
another dozen times to make really sure, as it would occur normally about
once every 3 to 6 times. But, I've verified it by looking at the syslog
output from the printk's as well, making sure that the condition causing
the original problem did occur (splitting the last packet), since it was
that condition that only happened once every 3 to 6 times the connection
was run, depending of course on what else was going on in the system and
little timing issues :)

Let me know if there are any other ways I can help out with this; I appreciate
you making me 'prove' the issue since that helps preserve the kernel's
stability :)

Thanks,

--------
Greg.Beeley@LightSys.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/