nonempty pipe does not select for write

Stanislav Meduna (stano@trillian.eunet.sk)
Sun, 26 Sep 1999 14:20:28 +0200 (CEST)


Hello,

I am summarizing a discussion that took place in
a local mailing list.

Given following program

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/time.h>

int can_write(int fd)
{
fd_set fds;
struct timeval tv = {0, 0};
FD_ZERO(&fds);
FD_SET(fd, &fds);
return select(fd + 1, NULL, &fds, NULL, &tv);
}

int main()
{
int p[2];
pipe(p);
write(p[1], "x", 1);
printf("%d\n", can_write(p[1]));
return 0;
}

the output is that there is no possibility to write
to a pipe filedescriptor, when there is something
in the pipe that was not read. The behaviour
is the same regardless of the pipe being a loop
in the same process or 'regular' interprocess pipe.

The theory is that if the PIPE_BUF is set to the actual
kernel buffer size for the pipe, it is in fact impossible
to do anything else - the pipe writes up to PIPE_BUF
are guaranteed to be atomic and if the descriptor selects
for write, it should be guaranteed that the next write
does not block. As the amount of the data in the send
is not known at the time of select, it is impossible
to let the user do a write when the remaining size of
the kernel buffer is less than PIPE_BUF.

This is a bad news - many programs communicate via pipes
and the protocol is not always request-response - a simple
signalling of some non-fd events is often also done this
way. This situation forces completely unnecessary context
switching between such processes. If I do a simple
two-process test sending one byte each other:

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/time.h>

int main()
{
int p[2];
pipe(p);

if (fork())
{
int x = 0;
close(p[0]);
while(1)
{
fd_set fds;
struct timeval tv = {1, 0};
FD_ZERO(&fds);
FD_SET(p[1], &fds);
if (select(p[1] + 1, NULL, &fds, NULL, &tv) > 0)
{
write(p[1], "x", 1);
x++;
if (! (x % 100000))
fprintf(stderr, ".");
}
}
}
else
{
char foo;
close(p[1]);
while(1)
read(p[0], &foo, 1);
}
}

the context-switch rate (as seen in /proc/stat) rockets
to about 75000 / sec, the data rate is half of that and the
user:system time ratio is about 20:80 (2.2.12, PPro 166 MHz)

The possible solution seems to either enlarge the buffer
in kernel, or to reduce the PIPE_BUF - the ratio 1:2
seems optimal.

Regards

-- 
				Stano

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/