Re: recvfrom in linux not retrun; problem description in detail

Paul (set@pobox.com)
Thu, 3 Jun 1999 23:57:37 -0400 (EWT)


On Thu, 3 Jun 1999, Xiaodi Lu wrote:

@>
@>Hi,
@>I found I didn't describe my question clear. I give more detail here.
@>
@>Here is the example code.
@>-----------------------------------
@>main( ){
@>........
@> signal(SIGALRM, (void) Handler);
@> flag=0;
@> alarm(1);
@> printf("message 1");
@> recvfrom(..............);
@> printf("message 3");
@>}
@>Handler(int sig) {
@> flag=1;
@> printf("message 2");
@> //kill(getpid( ), SIGPIPE);
@> return 0;
@>}
@>-------------------------------------
@>Most time it works fine. But when error happen(may be one packet loss),
@>I saw the message 1, and message 2. Message 3 not comming.
@>recvfrom doesn't even return an error value such as EINTR.
@>If I send SIGPIPE signal in Handler, the process will stop and show
@>error message
@>"pipe broken".
@>If don't send SIGPIPE, the process will hang there forever.
@>
@>I also tried to set the socket as non-blocking. It gives recvfrom
@>error very often. Actually my client need to send many packets to
@>server, and the serveris continuously running. Set it as non-block may
@>lost a
@>lot of data.
@>
@>I used redhat 5.2, kernel 2.0.36
@>

This looks like correct behaviour. Here is a snipit from a netbsd man
page for signal. I assume we treat SA_RESTART similarly:

For some system calls, if a signal is caught while the call is executing
and the call is prematurely terminated, the call is automatically
restarted. (The handler is installed using the SA_RESTART flag with
sigaction(2).) The affected system calls include read(2), write(2),
sendto(2), recvfrom(2), sendmsg(2) and recvmsg(2) on a communications
channel or a low speed device and during a ioctl(2) or wait(2). However,
calls that have already committed are not restarted, but instead return a
partial success (for example, a short read count).

So, it looks like a UDP echo packet gets lost somewhere (this is not an
'error' -- UDP packets are defined as unreliable, and errors are not generated
when they are lost) Then, if the implementation of signal sets the SA_RESTART
flag when it uses sigaction, then the syscall starts over, and continues to
wait forever for the lost packet.

In fact, I tested it with this code:

signal(SIGALRM, (sighandler_t)Handler);
sigaction(SIGALRM, NULL, &oldact);
printf("flags: %x\n", oldact.sa_flags);

The output was;

flags: 10000000

And in <sigaction.h>, we see:
#define SA_RESTART 0x10000000 /* Don't restart syscall on signal return. */

If you dont like this behaviour, you will have to use sigaction(), and dont
set that bit. (or unset it) Like: (continuing on the code fragment above)

oldact.sa_flags &= ~SA_RESTART;
sigaction(SIGALRM, &oldact, NULL);

Paul

ps. The comment in sigaction.h seems wrong.

pps. If you would like to see a complete code implementation illustrating
this, let me know.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/