Provided:
- tcpdump
- shell script to create the reset.
Hard Facts:
A rlogin (or telnet) connection to a system on a local
network is closed at random ("connect reset by peer") long
after the session is established and working.
We are able now to reproduce (still not able to predict it)
the "reset".
- Reset occur between system with kernel 2.2.12
(2.2.5, 2.2.10 etc..).
- We first noticed the probleme between 2 productions
systems (2.0.36) while one of the system on
the network was (kernel) 2.2.2. I agree this
one is difficult to swallow! (I am just reporting
hard fact)
- "Reset" are prone to occure when the traffic between
both system is heavy.
- X is not involved, A telnet from the console
is also showing "connect reset by peer"
- Seems not be hardware related nor ethernet card
related as we changed ethernet card and type on systems
involved.
Status:
- Random "connection reset", are occuring for 2 months
now, we upgraded to "fresh" distribution (RedHat-6.0)
and new kernel, problem still present...
I looked at usenet and FAQ, found nothing I could
relate to the problem.
I start to be "dry" on that one, my last hope is
a fancy timing problem deep in the kernel.
At least could somebody reproduce the event.
Any suggestions to narrow the event conditions
are welcome.
Showing the "bug":
- From system A we open 4 windows sessions to system B,
and start this script (nothing fancy) on each windows
to keep up traffic on the connection.
#! /bin/sh
#---------------------------------
#shell script to sustaine traffic.
#---------------------------------
while [ 1 ];
do
ls -ails /
done
- After minutes (5 to 10) a first screen connection is
"reset by peer", further screens will drop also as the
time is going.
tcpdump
Test was done between system A (jaguar.zoo) and B (panthere.zoo),
the shell scripts were running on B.
All traffic from/to B was monitored. Test started at
21:31:28.380929 ,
at 21:36:12.632368 system A issued a connection reset packet
;-----------------------------------------------------------------------------
21:31:28.380929 jaguar.zoo.1019 > panthere.zoo.login: P 1114527453:1114527454(1) ack 1138414377 win 15928 <nop,nop,timestamp 967030 1253145> (DF) [tos 0x10] (ttl 64, id 12375)
21:31:28.382165 panthere.zoo.login > jaguar.zoo.1019: P 1:7(6) ack 1 win 32120 <nop,nop,timestamp 1253960 967030> (DF) [tos 0x10] (ttl 64, id 16312)
21:31:28.397311 jaguar.zoo.1019 > panthere.zoo.login: . ack 7 win 15928 <nop,nop,timestamp 967032 1253960> (DF) [tos 0x10] (ttl 64, id 12376)
[.......................]
21:36:12.509797 panthere.zoo.login > jaguar.zoo.1019: P 9807308:9807749(441) ack 2 win 32120 <nop,nop,timestamp 1282373 995443> (DF) [tos 0x10] (ttl 64, id 36277)
21:36:12.528400 jaguar.zoo.1019 > panthere.zoo.login: . ack 9807749 win 15928 <nop,nop,timestamp 995445 1282373> (DF) [tos 0x10] (ttl 64, id 27357)
21:36:12.529562 panthere.zoo.login > jaguar.zoo.1019: P 9807749:9808058(309) ack 2 win 32120 <nop,nop,timestamp 1282375 995445> (DF) [tos 0x10] (ttl 64, id 36278)
21:36:12.536593 panthere.zoo.login > jaguar.zoo.1019: P 9808058:9808801(743) ack 2 win 32120 <nop,nop,timestamp 1282376 995445> (DF) [tos 0x10] (ttl 64, id 36279)
21:36:12.548399 jaguar.zoo.1019 > panthere.zoo.login: . ack 9808801 win 15928 <nop,nop,timestamp 995447 1282375> (DF) [tos 0x10] (ttl 64, id 27358)
21:36:12.549550 panthere.zoo.login > jaguar.zoo.1019: P 9808801:9809094(293) ack 2 win 32120 <nop,nop,timestamp 1282377 995447> (DF) [tos 0x10] (ttl 64, id 36280)
21:36:12.568399 jaguar.zoo.1019 > panthere.zoo.login: . ack 9809094 win 15928 <nop,nop,timestamp 995449 1282377> (DF) [tos 0x10] (ttl 64, id 27359)
21:36:12.569756 panthere.zoo.login > jaguar.zoo.1019: P 9809094:9809550(456) ack 2 win 32120 <nop,nop,timestamp 1282379 995449> (DF) [tos 0x10] (ttl 64, id 36281)
21:36:12.576129 panthere.zoo.login > jaguar.zoo.1019: P 9809550:9810293(743) ack 2 win 32120 <nop,nop,timestamp 1282380 995449> (DF) [tos 0x10] (ttl 64, id 36282)
21:36:12.588400 jaguar.zoo.1019 > panthere.zoo.login: . ack 9810293 win 15928 <nop,nop,timestamp 995451 1282379> (DF) [tos 0x10] (ttl 64, id 27360)
21:36:12.589351 panthere.zoo.login > jaguar.zoo.1019: P 9810293:9810439(146) ack 2 win 32120 <nop,nop,timestamp 1282381 995451> (DF) [tos 0x10] (ttl 64, id 36283)
21:36:12.608399 jaguar.zoo.1019 > panthere.zoo.login: . ack 9810439 win 15928 <nop,nop,timestamp 995453 1282381> (DF) [tos 0x10] (ttl 64, id 27361)
21:36:12.610109 panthere.zoo.login > jaguar.zoo.1019: P 9810439:9811115(676) ack 2 win 32120 <nop,nop,timestamp 1282383 995453> (DF) [tos 0x10] (ttl 64, id 36284)
21:36:12.628400 jaguar.zoo.1019 > panthere.zoo.login: . ack 9811115 win 15928 <nop,nop,timestamp 995455 1282383> (DF) [tos 0x10] (ttl 64, id 27362)
21:36:12.630142 panthere.zoo.login > jaguar.zoo.1019: P 9811115:9811784(669) ack 2 win 32120 <nop,nop,timestamp 1282385 995455> (DF) [tos 0x10] (ttl 64, id 36285)
21:36:12.632368 jaguar.zoo.1019 > panthere.zoo.login: R 1114527455:1114527455(0) win 0 [tos 0x10] (ttl 254, id 27363)
21:36:12.648400 jaguar.zoo.1019 > panthere.zoo.login: . ack 9811784 win 15928 <nop,nop,timestamp 995457 1282385> (DF) [tos 0x10] (ttl 64, id 27364)
21:36:12.649087 panthere.zoo.login > jaguar.zoo.1019: R 1148226160:1148226160(0) win 0 [tos 0x10] (ttl 255, id 36286)
21:36:43.701605 panthere.zoo.ntp > lion.zoo.ntp: v3 client strat 4 poll 6 prec -18 dist 0.045028 disp 0.031143 ref tigre.zoo@3144879357.702836990 [|ntp] (ttl 64, id 36287)
21:36:43.702175 lion.zoo.ntp > panthere.zoo.ntp: v3 server strat 3 poll 6 prec -16 dist 0.043365 disp 0.036743 ref dns1.cmc.ec.gc.ca@3144879391.990171432 [|ntp] (ttl 64, id 61995)
21:36:48.692244 arp who-has panthere.zoo tell lion.zoo
21:36:48.692503 arp reply panthere.zoo is-at 0:0:c0:92:e9:c1
21:37:01.701165 panthere.zoo.ntp > tigre.zoo.ntp: v3 client strat 4 poll 6 prec -18 dist 0.045028 disp 0.031356 ref tigre.zoo@3144879357.702836990 [|ntp] (ttl 64, id 36288)
21:37:01.701832 tigre.zoo.ntp > panthere.zoo.ntp: v3 server strat 3 poll 6 prec -18 dist 0.043930 disp 0.021347 ref otc2.psu.edu@3144879371.959439277 [|ntp] (ttl 64, id 59591)
21:37:06.699612 arp who-has panthere.zoo tell tigre.zoo
21:37:06.699863 arp reply panthere.zoo is-at 0:0:c0:92:e9:c1
21:37:47.700109 panthere.zoo.ntp > lion.zoo.ntp: v3 client strat 4 poll 6 prec -18 dist 0.044921 disp 0.031875 ref tigre.zoo@3144879421.701067924 [|ntp] (ttl 64, id 36289)
21:37:47.700695 lion.zoo.ntp > panthere.zoo.ntp: v3 server strat 3 poll 6 prec -16 dist 0.043365 disp 0.037704 ref dns1.cmc.ec.gc.ca@3144879455.991878509 [|ntp] (ttl 64, id 62347)
21:37:52.691206 arp who-has panthere.zoo tell lion.zoo
21:37:52.691456 arp reply panthere.zoo is-at 0:0:c0:92:e9:c1
21:38:05.699705 panthere.zoo.ntp > tigre.zoo.ntp: v3 client strat 4 poll 6 prec -18 dist 0.044921 disp 0.032089 ref tigre.zoo@3144879421.701067924 [|ntp] (ttl 64, id 36290)
21:38:05.700378 tigre.zoo.ntp > panthere.zoo.ntp: v3 server strat 3 poll 6 prec -18 dist 0.047775 disp 0.025131 ref otc2.psu.edu@3144879435.937322616 [|ntp] (ttl 64, id 59602)
21:38:10.693429 arp who-has panthere.zoo tell tigre.zoo
21:38:10.693678 arp reply panthere.zoo is-at 0:0:c0:92:e9:c1
21:38:23.699295 panthere.zoo.ntp > lion.zoo.ntp: v3 client strat 4 poll 7 prec -18 dist 0.048767 disp 0.035339 ref tigre.zoo@3144879485.699578285 [|ntp] (ttl 64, id 36291)
21:38:23.699840 lion.zoo.ntp > panthere.zoo.ntp: v3 server strat 3 poll 7 prec -16 dist 0.043365 disp 0.038116 ref dns1.cmc.ec.gc.ca@3144879455.991878509 [|ntp] (ttl 64, id 62525)
21:38:28.698983 arp who-has lion.zoo tell panthere.zoo
21:38:28.699255 arp reply lion.zoo is-at 0:0:b4:84:d6:34
;-----------------------------------------------------------------------------
A bientot
==========================================================================
Jean-Marc Pigeon Internet: Jean-Marc.Pigeon@safe.ca
SAFE Inc. Phone: (514) 493-4280 Fax: (514) 493-1946
REGULUS, a real time accounting program for ISP
REGULUS' Home base <http://www.regulus.safe.ca>
==========================================================================
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/