Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+

From: Ilpo Järvinen
Date: Sat May 31 2008 - 14:38:24 EST


On Sat, 31 May 2008, Håkon Løvdal wrote:

> Ilpo Järvinen wrote:
> > Hmm, are the other end's processes still there? ...I'd be interested to know
> > what they're doing at the moment...
>
> > I meant that end where you see this '-'. I suppose it's easy for you to
> > figure out which process is the right one, something that wouldn't be so
> > easy with the Ingo's test case which forks/exits numerous times.
>
> > Died? Do you mean that they don't exist all at the other end anymore?

Ok, like you said, this is not exactly the same, though it might be due to
the same bug. In Ingo's case both endpoints were doing pretty healtily,
with periodic window probes as expected. In your case TCP is not doing
window probes but got that interesting RTO value.

So you had that '-' earlier and you checked at that time but the
connection is now already dead?

> The ssh connection used for copying (using the command <ssh old_pc "cd
> /directory; tar cvf - *" | pv | tar xvf ->) died in the following way:
> ...
> Read from remote host old_pc: Connection timed out
> 51.4GB 4:26:19 [3.29MB/s] [<=> ]
> tar: Unexpected EOF in archive
> tar: Unexpected EOF in archive
> tar: Error is not recoverable: exiting now
>
> and there are currently no traces of those ssh processes any longer on
> the new PC, only these two active ssh interactive connections are present:

:-(, I would some much liked to see what they were doing.

> old_pc>sed -n '1p; /:0016/p' /proc/net/tcp
> sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt
> uid timeout inode
> 12: 00000000:0016 00000000:0000 0A 00000000:00000000 00:00000000 00000000
> 0 0 7627 1 f384b080 3000 0 0 2 -1
> 17: 111111AC:0016 480111AC:CDBB 01 00000000:00000000 02:0001AA7B 00000000
> 0 0 1110320 2 f02e6580 201 53 7 3 -1
> 20: 111111AC:0016 480111AC:AB31 01 00000000:00000000 02:000A18F1 00000000
> 0 0 583506 4 f71a8080 201 40 29 3 -1
> 21: 111111AC:0016 480111AC:E4E9 01 00000B50:00000000 01:7D1F8746 00000000
> 0 0 398713 5 f71a8580 205 40 1 36 -1
> 23: 111111AC:0016 480111AC:D359 01 000010F8:00000000 01:7D19A035 00000000
> 0 0 396426 5 f71a8a80 202 42 1 144 -1
> 25: 111111AC:0016 480111AC:8565 01 00000B50:00000000 01:7CEBA7D1 00000000
> 0 0 349113 5 eeeaf580 204 40 1 26 -1
> old_pc>

These 7C/D... certainly seem strange values. Which TCP variant you have in
use (cat /proc/sys/net/ipv4/tcp_congestion_control)? It seems that vegas,
veno and yeah at least contain 0x7fffffff there for some rtt, which could
perhaps somehow leak.

--
i.