Linux frop 2.0.25 #2 Mon Nov 18 15:32:26 EST 1996 i586
I am the author of the netpipes package (version 3.2 released, 4.0 in
development). I have run into what I think may be a kernel bug.
While developing the ssl-auth encryption/authentication wrapper I have
(intermittently) been getting a "Connection reset by peer" error when
READING from a socket. Since the other end of the connection is not
experiencing any errors, I can only present the following theory:
Some of my utilities (http://www.purplefrog.com/~thoth/netpipes/) have
buffering code in them. Notably hose when invoked with the -slave argument
copies from stdin to the socket and from the socket to stdout. When input
is exhausted on stdin, the hose program issues a shutdown() system call
which closes half of the socket, but leaves the other open. This is
necessary to prevent deadlock.
The instance of the error that I have concentrated on is when two
processes on the same machine are communicating through a TCP socket. If
one process writes a lot of data (64K) and then performs a shutdown(sock,1)
all while the receiving process is blocked, the receiving process has a
small chance ( ~ 1/5 ? ) of getting the "Connection reset by peer" error.
frop:107 $ faucet 3000 -vio sh -c "sleep 10; cat"
faucet; of netpipes version 4.0, Copyright (C) 1992-96 Robert Forsman
faucet comes with ABSOLUTELY NO WARRANTY;
This is free software, and you are welcome to redistribute it
under the GNU General Public License as published by the Free
Software Foundation; either version 2 of the License, or
(at your option) any later version.
faucet: Got connection from 127.0.0.1(localhost) port 1311
frop:32 $ dd if=/vmlinuz bs=4096 count=16 | hose localhost 3000 -v \
-slave > /tmp/shit
hose; of netpipes version 4.0, Copyright (C) 1992-96 Robert Forsman
hose comes with ABSOLUTELY NO WARRANTY;
This is free software, and you are welcome to redistribute it
under the GNU General Public License as published by the Free
Software Foundation; either version 2 of the License, or
(at your option) any later version.
hose: attempting to connect to 127.0.0.1(localhost) port 3000
16+0 records in
16+0 records out
during copyio() read(2)(0): Connection reset by peer
I think something is going wrong in the buffering of data in the kernel.
Pop Quiz:
1) Why is Bob having this problem [100%]
a) This is a kernel bug. It is/will-be fixed in version __.__
b) You are using the system calls wrong. You should use the following
procedure to notify the remote process that input is exhausted on the
file descriptor: ___________________________________________________
2) [There is no question 2]
For extra credit if you answer [a] to question 1:
Provide a patch to fix the kernel bug.
--- Bob Forsman thoth@gainesville.fl.us http://www.gainesville.fl.us/~thoth/