Re: 1000ms delay in networking stack or driver, new bug?

David Woodhouse (David.Woodhouse@mvhi.com)
Thu, 23 Sep 1999 10:58:07 +0100


kkeil@suse.de said:
> Line slows down, a running ping (direct to the peer) freezed. He
> started started tcpdump and a new ping, now he saw that with the new
> ping also the missing packets from the old ping are seen in tcpdump,
> and also get a quick response.

kuznet@ms2.inr.ac.ru said:
> I repeat the request: gentlemen, please, make tcpdump both on wire
> and on local host with "tcpdump -i eth0 -n -v -s 256 -x icmp", when
> pinging. The investigation even will not be started until I get at
> least one report of this kind.

If this is still the 'constant outgoing packet queue length' bug I was
reporting about this time last year, then I've done tcpdumps at either end. I
don't still have them, though. They'll be in linux-kernel and/or linux-isdn
archives somewhere if they actually got posted.

A quick dredge through my archives shows up the mail I've included below.

It doesn't contain the remote tcpdump, but I remember having done it. Not at
the other end of the ISDN link, because that was obviously not a machine I had
control over, but I ran a ping to somewhere else fairly close, and got a
tcpdump from there. It agreed 100% with the local tcpdump which I _did_ include
in the mail below, give or take the expected few milliseconds. It was enough
to convince me that the packets _really_ are sent at the time the local
machine thinks they are sent. What you see here is really what's on the wire.
Hopefully this is enough to start the investigation.

I can't reproduce this any more, because after about six months of this we just
ditched the ISDN and installed a leased line instead. The ISDN only comes up
now when we have stupidly large downloads to loadbalance. We never keep it up
for long enough to see whether this problem still occurs.

------------------------------
To: isdn4linux@hub-wue.franken.de
Subject: Outgoing packets queued for ever...
Date: Sat, 07 Nov 1998 01:11:17 +0000
From: David Woodhouse <David.Woodhouse@mvhi.com>

Hello all,

I sent this mail to linux-{kernel,isdn}@vger.rutgers.edu a couple of months
ago, but received no response (other than others saying that they've seen the
same problem). So I'm reporting it here, too.

Kernels between 2.1.120 and 2.1.127-pre2 haven't fixed the problem. I have now
established that it can be temporarily fixed by removing the offending ippp
device completely ('isdnctrl delif ippp0') and reconfiguring it, so at least I
don't have to reboot every time.

> I'm seeing extremely strange behaviour in recent 2.1 kernels - I noticed it in
> 2.1.120, and dropping back to 2.1.117 hasn't cured it.
>
> The problem seems to be that packets due to be sent out the ISDN interface
> aren't being sent until another packet arrives on the queue. So the queue is
> never emptied - there are always packets ready to be sent, but even when the
> link is idle, they remain queued until a new packet arrives to push them off
> the queue and out into the ether.
>
> I can start trying to ping the other end of the link, and get constant round
> trip times of extremely close to an integer number of seconds, equal to the
> current size of the packet queue, because the packets are remaining on the
> outgoing queue until enough new packets have arrived to flush the preceding
> packets.
>
>
> For example... The link is idle, with some packets queued, but unsent, from a
> stalled TCP connection. tcpdump is running and shows no activity since the
> packet at 14:36:08.550691.
>
> At 14:37:11, I start to ping the other side of the ISDN link. Each time 'ping'
> sends a packet, I see a packet go out the ISDN link. It's just that the packet
> that gets transmitted bears no relation to the packet that 'ping' tried to
> send. The first six packets are from the TCP connection, then when 'ping' tries
> to send its seventh packet, its first one is actually transmitted, etc. Hence a
> fairly constant round trip time of just over six seconds.
>
> The length of the queue varies between the ideal zero and about nine. I have
> yet to see it get any higher.
>
> --- tcpdump output:
>
> 14:36:08.550691 195.11.50.200.8080 > 158.152.16.50.1709: . ack 430 win 8760 <nop,nop,timestamp 1072144 15104214>
>
> <Now the link is idle for a minute, even though there are packets to be sent>
> <It remains idle until I start 'ping' in another window, at which point...>
>
> 14:37:11.424123 158.152.16.50.1709 > 195.11.50.200.8080: . ack 218 win 32120 <nop,nop,timestamp 15104276 1072075> (DF)
> 14:37:11.454123 195.11.50.200.8080 > 158.152.16.50.1709: R 3963254873:3963254873(0) win 32120 <nop,nop,timestamp 15104276 1072075> (DF)
>
> <The packet is sent at exactly the same time that 'ping' attempts to send
> its first ICMP echo request. The response is fast, because it's my
> ISP's web cache that it's talking to - I don't think there's a similar
> problem with incoming packets.>
>
> 14:37:12.424019 158.152.16.50.1709 > 195.11.50.200.8080: . ack 218 win 32120 <nop,nop,timestamp 15104376 1072077> (DF)
> 14:37:12.454019 195.11.50.200.8080 > 158.152.16.50.1709: R 3963254873:3963254873(0) win 32120 <nop,nop,timestamp 15104376 1072077> (DF)
>
> <Exactly one second later, with 'ping's second packet>
>
> 14:37:13.423915 158.152.16.50.1709 > 195.11.50.200.8080: . ack 218 win 32120 <nop,nop,timestamp 15104477 1072079> (DF)
> 14:37:13.513915 195.11.50.200.8080 > 158.152.16.50.1709: R 3963254873:3963254873(0) win 32120 <nop,nop,timestamp 15104477 1072079> (DF)
>
> 14:37:14.423810 158.152.16.50.1709 > 195.11.50.200.8080: . ack 218 win 32120 <nop,nop,timestamp 15104577 1072081> (DF)
> 14:37:14.453810 195.11.50.200.8080 > 158.152.16.50.1709: R 3963254873:3963254873(0) win 32120 <nop,nop,timestamp 15104577 1072081> (DF)
>
> 14:37:15.423706 158.152.16.50.1709 > 195.11.50.200.8080: . ack 218 win 32120 <nop,nop,timestamp 15105073 1072091> (DF)
> 14:37:15.453706 195.11.50.200.8080 > 158.152.16.50.1709: R 3963254873:3963254873(0) win 32120 <nop,nop,timestamp 15105073 1072091> (DF)
>
> 14:37:16.423602 158.152.16.50.1709 > 195.11.50.200.8080: F 429:429(0) ack 218 win 32120 <nop,nop,timestamp 15107679 1072091> (DF)
> 14:37:16.453602 195.11.50.200.8080 > 158.152.16.50.1709: R 3963254873:3963254873(0) win 32120 <nop,nop,timestamp 15107679 1072091> (DF)
>
> <Now ping has tried to send six packets, and each time it has tried,
> an unrelated packet has been sent at exactly the same time.>
>
> 14:37:17.423498 158.152.16.50 > 158.152.1.222: icmp: echo request
> 4500 0054 113c 0000 4001 1a2d 9e98 1032
> 9e98 01de 0800 8613 b901 00
> 14:37:17.453498 158.152.1.222 > 158.152.16.50: icmp: echo reply
> 4500 0054 c521 0000 ff01 a746 9e98 01de
> 9e98 1032 0000 8e13 b901 00
>
> <Finally an ICMP echo packet. Six seconds late.>
>
> 14:37:18.423393 158.152.16.50 > 158.152.1.222: icmp: echo request
> 4500 0054 1150 0000 4001 1a19 9e98 1032
> 9e98 01de 0800 c519 b901 01
> 14:37:18.453393 158.152.1.222 > 158.152.16.50: icmp: echo reply
> 4500 0054 c621 0000 ff01 a646 9e98 01de
> 9e98 1032 0000 cd19 b901 01
>
> <etc...>
>
> 14:37:19.423289 158.152.16.50 > 158.152.1.222: icmp: echo request
> 4500 0054 1163 0000 4001 1a06 9e98 1032
> 9e98 01de 0800 331a b901 02
> 14:37:19.483289 158.152.1.222 > 158.152.16.50: icmp: echo reply
> 4500 0054 c821 0000 ff01 a446 9e98 01de
> 9e98 1032 0000 3b1a b901 02
>
> 14:37:20.423185 158.152.16.50 > 158.152.1.222: icmp: echo request
> 4500 0054 1176 0000 4001 19f3 9e98 1032
> 9e98 01de 0800 9b1a b901 03
> 14:37:20.463185 158.152.1.222 > 158.152.16.50: icmp: echo reply
> 4500 0054 d021 0000 ff01 9c46 9e98 01de
> 9e98 1032 0000 a31a b901 03
>
> 14:37:21.423081 158.152.16.50 > 158.152.1.222: icmp: echo request
> 4500 0054 1189 0000 4001 19e0 9e98 1032
> 9e98 01de 0800 071b b901 04
> 14:37:21.453081 158.152.1.222 > 158.152.16.50: icmp: echo reply
> 4500 0054 d521 0000 ff01 9746 9e98 01de
> 9e98 1032 0000 0f1b b901 04
>
>
> --- ping output:
> firewall /root # date; ping demon-du.demon.co.uk
> Tue Sep 8 14:37:11 BST 1998
> PING demon-du.demon.co.uk (158.152.1.222): 56 data bytes
> 64 bytes from 158.152.1.222: icmp_seq=0 ttl=255 time=6038.1 ms
> 64 bytes from 158.152.1.222: icmp_seq=1 ttl=255 time=6037.6 ms
> 64 bytes from 158.152.1.222: icmp_seq=2 ttl=255 time=6069.6 ms
> 64 bytes from 158.152.1.222: icmp_seq=3 ttl=255 time=6044.7 ms
> 64 bytes from 158.152.1.222: icmp_seq=4 ttl=255 time=6036.8 ms
> 64 bytes from 158.152.1.222: icmp_seq=5 ttl=255 time=6037.8 ms
> 64 bytes from 158.152.1.222: icmp_seq=6 ttl=255 time=6039.7 ms
> 64 bytes from 158.152.1.222: icmp_seq=7 ttl=255 time=6041.3 ms
> 64 bytes from 158.152.1.222: icmp_seq=8 ttl=255 time=6037.6 ms
> 64 bytes from 158.152.1.222: icmp_seq=9 ttl=255 time=6037.4 ms
>
>
>
> Kernel is a stock 2.1.120, Intel UP.
> ISDN is a Teles 16.3PnP.
> This machine is also masquerading.
> Closing down the link and restarting it seems to make no difference,
> and the machine doesn't show any signs of misbehaviour when routing packets on
> the Ethernet - it has 100Base-T and 10Base-2, both Tulip cards.
>
> Rebooting the machine seems to help. Sometimes the problem comes back within
> minutes, though. Sometimes it takes a few days.
>
> What else can I do to try to find the problem?
>

---- ---- ----
David Woodhouse David.Woodhouse@mvhi.com Office: (+44) 1223 810302
Project Leader, Process Information Systems Mobile: (+44) 976 658355
Axiom (Cambridge) Ltd., Swaffham Bulbeck, Cambridge, CB5 0NA, UK.
finger dwmw2@ferret.lmh.ox.ac.uk for PGP key.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/