Re: epoll (was Re: [PATCH] async poll for 2.5)

From: Charles 'Buck' Krasic (krasic@acm.org)
Date: Fri Oct 18 2002 - 16:01:08 EST


> >[N-1] for (;;) {
> >[N ] fd = event_wait(...);
> >[N+1] while (do_io(fd) != EAGAIN);
> >[N+2} }

I'm getting confused over what minute details are being disputed here.

This debate might get clearer, to me anyway, if the example code
fragments were more concrete.

So if anybody still cares at this point, here is my stab at clarifying
some things.

PART I: THE RACE

Suppose we have the following:

1 for(;;) {
2 fd = event_wait(...);
3 if(fd == my_listen_fd) {
4 /* new connections */
5 while((new_fd = my_accept(my_listen_fd, ...) != EAGAIN))
6 epoll_addf(new_fd, ...);
7 } else {
8 /* established connections */
9 while(do_io(fd) != EAGAIN)
10 }
11 }

With the current epoll/rtsig semantics, there is a race condition
above. I think this essentially the same race condition as the
snippet at the top of this message.

Just to be clear, I walk completely through the steps in the race
scenario, as follows.

We start with our application blocked in line 2.

A new connection is initiated by the application on other side.

The kernels exchange SYNs, causing the connection to be established.

The kernel on our side queues the new connection, waiting for the
application on this side to call accept(). In the process it fires an
edge POLLIN on the listen_fd, which wakes up the kernel side of line
2. However, some time may pass before we actually wake up.

Meanwhile, the other side immediately sends some application level
data. The other side is going to wait for us to read the application
level data and respond. So it is now blocked.

All of this happens before our application runs line 5 to pick up the
new connection from the kernel.

Here comes the race:

Before we reach line 6, new_fd is not in epoll mode, so packet
arrivals do not trigger a POLLIN edge notfication on new_fd.

After line 6, there will be no data from the other side, so there will
still be no POLLIN edge notification for new_fd.

Therefore, line 2 will never yield a POLLIN event for new_fd, and the
new connection is now deadlocked.

Is this the kind of race we're talking about?

If so, we proceed as follows.

PART 2: SOLUTIONS

A race free alternative to write the code above is as follows. Only
one new line (marked with *) is added.

1 for(;;) {
2 fd = event_wait(...);
3 if(fd == my_listen_fd) {
4 /* new connections */
5 while((new_fd = my_accept(my_listen_fd, ...) != EAGAIN)) {
6 epoll_addf(new_fd, ...);
7* while(do_io(new_fd) != EAGAIN);
8 }
9 } else {
10 /* established connections */
11 while(do_io(fd) != EAGAIN)
12 }
13 }

The example above works with current epoll and rtsig semantics. This
is just rephrasing what Davide has been saying: "Never call event_wait
without first ensuring that IO space is definitively exhausted".

Or we could have (to make John happier?):

1 for(;;) {
2 fd = event_wait(...);
3 if(fd == my_listen_fd) {
4 /* new connections */
5 while((new_fd = my_accept(my_listen_fd, ...) != EAGAIN)) {
6* epoll_addf(new_fd, &pfd, ...);
7* if(pfd.revents & POLLIN) {
7* while(do_io(new_fd) != EAGAIN);
8* }
8 }
9 } else {
10 /* established connections */
11 while(do_io(fd) != EAGAIN)
12 }
13 }

Here, epoll_addf primitive has been modified to return the initial
status. Presumably so we avoid the first call to do_io if there is
nothing to do yet.

If it's easy to do (change add primitive that is), why not?

The first solution works either way.

-- Buck

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Oct 23 2002 - 22:00:44 EST