RSH problem is back

NIIBE Yutaka (gniibe@mri.co.jp)
Sat, 9 Nov 1996 17:35:07 +0900


Hello,

Kees Bakker writes:
> Since about 1.3.86 the well-known rsh problem is back. A command like:
>
> rsh turku cat some-file </dev/null
>
> might truncate the tail of the output. This was reported about a year ago,
> and in two areas a solution was found (I think). One was to fix rsh.c so
> that it does not quit until the remote end stops (even if there is a EOF in
> stdin). The second was a fix in the kernel as I remember, but I could be
> wrong.

I think this is kernel problem. However, I think
linux-net@vger.rutgers.edu list is appropriate.

I spend this afternoon to investigate this problem. My environment is
2.0.24.

Here is the output of tcpdump along with the explanation. "lo" stands
for "localhost".

====================
12:35:50.080604 lo.1023 > lo.shell: S 2345183966:2345183966(0) win 512 <mss 3544>
12:35:50.080604 lo.shell > lo.1023: S 3547061138:3547061138(0) ack 2345183967 win 31744 <mss 3544>

12:35:50.080604 lo.1023 > lo.shell: . ack 1 win 31744
12:35:50.080604 lo.1023 > lo.shell: P 1:6(5) ack 1 win 31744 (DF)
12:35:50.100604 lo.shell > lo.1023: . ack 6 win 31744

***** Connection is established between RSHD and RSH,
***** This connection is used for stdin/stdout.
***** RSH sends user info.

12:35:50.360604 lo.1021 > lo.1022: S 4132640101:4132640101(0) win 512 <mss 3544>
12:35:50.360604 lo.1022 > lo.1021: S 2610945449:2610945449(0) ack 4132640102 win 31744 <mss 3544>
12:35:50.360604 lo.1021 > lo.1022: . ack 1 win 31744

***** RSHD connects back to RSH for signal and stderr.

12:35:50.360604 lo.1023 > lo.shell: P 6:13(7) ack 1 win 31744 (DF)
12:35:50.380604 lo.shell > lo.1023: . ack 13 win 31744
12:35:50.380604 lo.1023 > lo.shell: P 13:343(330) ack 1 win 31744 (DF)
12:35:50.400604 lo.shell > lo.1023: . ack 343 win 31744
12:35:51.260604 lo.shell > lo.1023: P 1:2(1) ack 343 win 31744 (DF)
12:35:51.280604 lo.1023 > lo.shell: . ack 2 win 31744
12:35:51.280604 lo.1023 > lo.shell: F 343:343(0) ack 2 win 31744
12:35:51.280604 lo.shell > lo.1023: . ack 344 win 31744

***** RSH sends command line, and closes the input.
***** From here, connection becomes so-called "Half Close" state.

12:35:51.430604 lo.shell > lo.1023: . 2:3546(3544) ack 344 win 31744 (DF)
12:35:51.430604 lo.shell > lo.1023: . 3546:7090(3544) ack 344 win 31744 (DF)
12:35:51.430604 lo.shell > lo.1023: . 7090:10634(3544) ack 344 win 31744 (DF)
12:35:51.440604 lo.1023 > lo.shell: . ack 10634 win 30720

***** RSHD sends the output to RSH.

[...] More sends from RSHD to RSH.

12:36:00.280604 lo.1021 > lo.1022: F 1:1(0) ack 1 win 31744
12:36:00.280604 lo.1022 > lo.1021: . ack 2 win 31744

***** The connection of signal/stderr is closed by RSHD.

[...] More sends from RSHD to RSH.

12:36:00.660604 lo.shell > lo.1023: P 719760:721504(1744) ack 344 win 31744 (DF)
12:36:00.680604 lo.1023 > lo.shell: . ack 721504 win 4096
12:36:00.680604 lo.shell > lo.1023: P 721504:725022(3518) ack 344 win 31744
12:36:00.680604 lo.shell > lo.1023: F 725022:725022(0) ack 344 win 31744

***** The connection of stdin/stdout is closed by RSHD.
***** sk->state of RSHD becomes TCP_LAST_ACK, waiting ACK from RSH.

12:36:00.680604 lo.1023 > lo.shell: . ack 725023 win 577

***** RSH sends ACK for the packet <P 721504:725022(3518) ack 344 win 31744>.
***** Receiving this packet, sk->state of RSHD becomes TCP_CLOSE.

12:36:00.770604 lo.1023 > lo.shell: . ack 725023 win 6144

***** RSH sends ACK for the packet <F 725022:725022(0) ack 344 win 31744>.
***** sk->state of RSHD is already TCP_CLOSE, so, the function
***** get_sock() (of af_init.c) cannot find the "struct sock *" of this
***** paticuler RSHD. Instead, it returns the "struct sock *" of the
***** daemon which listens on the port.
***** More specifically, it returns <lo.shell, lo.1023, TCP_LISTEN> instead of
***** <lo.shell, lo.1023, TCP_CLOSE>.
***** And this results tcp_send_reset() in the function tcp_rcv.
--------------- net/ipv4/tcp_input.c
if(sk->state==TCP_LISTEN)
{
if(th->ack) /* These use the socket TOS.. might want to be the received TOS */
tcp_send_reset(daddr,saddr,th,sk->prot,opt,dev,sk->ip_tos, sk->ip_ttl);
---------------

12:36:00.770604 lo.shell > lo.1023: R 3547786161:3547786161(0) win 0

***** BAD. Wrongly, the RSET comes.

12:36:00.890604 lo.1022 > lo.1021: F 1:1(0) ack 2 win 31744
12:36:00.890604 lo.1021 > lo.1022: . ack 2 win 31744

***** The connection of signal/stderr is closed by RSH.
====================

I think two changes are needed.

(1) get_sock() should return <lo.shell, lo.1023, TCP_LISTEN> in this case.
(2) tcp_rcv() should handle this case correctly.

Please test this out. I don't have time to test today. Well, this
patch is against 2.0.0, totally untested.

=========================
--- net/ipv4/af_inet.c~ Fri Jun 7 16:14:29 1996
+++ net/ipv4/af_inet.c Sat Nov 9 17:29:56 1996
@@ -1434,8 +1434,10 @@
#endif
continue;

+#if 0
if(s->dead && (s->state == TCP_CLOSE))
continue;
+#endif
/* local address matches? */
if (s->rcv_saddr) {
#ifdef CONFIG_IP_TRANSPARENT_PROXY
--- net/ipv4/tcp_input.c~ Sun Jun 9 17:39:09 1996
+++ net/ipv4/tcp_input.c Sat Nov 9 17:29:12 1996
@@ -1791,8 +1791,15 @@
* exist so should cause resets as if the port was unreachable.
*/

- if (sk->zapped || sk->state==TCP_CLOSE)
+ if (sk->zapped)
goto no_tcp_socket;
+
+ if (sk->state == TCP_CLOSE)
+ /* More checks may be needed.
+ th->ack
+ sk->sent_seq == skb->ack_seq
+ */
+ goto discard_it;

if (!sk->prot)
{
=========================

Thanks,

-- 
NIIBE Yutaka