That's _very_ interesting. Because 83 has only one very minimal patch to
the network code, and that patch shouldn't even trigger under normal
circumstances (it's only triggered when the receive queue allocation
grows beyond the allowed queue length - which shouldn't happen if the
sender sends sane packets within the window).
Even worse, the 83 patch actually has a bug that should probably crash
the machine if the new code is ever triggered ;-)
I'm including the fix for this bug, but I don't think it's responsible: I
really think you would see much worse problems than just bad throughput
if this bug was ever triggered. I'll take a closer look at the TCP dumps
soon, maybe that gives me some idea.
One thing that I'm starting to suspect is simply a timing issue: the new
code since 1.3.73 has been doing the ACK's from within the timer bottom
half handler, asynchronously from the actual network code. Now, if there
is a race within the slhc compression, it would be triggered a lot more
easily thanks to the more asynchronous network behaviour.
Thanks,
Linus
----------
--- v1.3.83/linux/net/ipv4/tcp_input.c Wed Apr 3 16:06:57 1996
+++ linux/net/ipv4/tcp_input.c Wed Apr 3 22:00:35 1996
@@ -1487,6 +1487,7 @@
if (skb->acked)
break;
__skb_unlink(skb, list);
+ kfree_skb(skb, FREE_READ);
}
}
@@ -1501,20 +1502,24 @@
{
struct sk_buff * skb = list->next;
- for (;;) {
- struct sk_buff * next;
+ if (skb != (struct sk_buff *) list) {
+ for (;;) {
+ struct sk_buff * next;
- if (skb == (struct sk_buff *) list)
- break;
- next = skb->next;
- if (next->seq == skb->seq) {
+ next = skb->next;
+ if (next == (struct sk_buff *) list)
+ break;
if (before(next->end_seq, skb->end_seq)) {
__skb_unlink(next, list);
+ kfree_skb(next, FREE_READ);
continue;
}
- __skb_unlink(skb, list);
+ if (next->seq == skb->seq) {
+ __skb_unlink(skb, list);
+ kfree_skb(skb, FREE_READ);
+ }
+ skb = next;
}
- skb = next;
}
}
----------