RE: epoll design problems with common fork/exec patterns

From: David Schwartz
Date: Sun Oct 28 2007 - 00:47:40 EST



> 6) Epoll removes the file from the set, when the *kernel* object gets
> closed (internal use-count goes to zero)
>
> With that in mind, how can the code snippet above trigger a removal from
> the epoll set?

I don't see how that can be. Suppose I add fd 8 to an epoll set. Suppose fd
5 is a dup of fd 8. Now, I close fd 8. How can fd 8 remain in my epoll set,
since there no longer is an fd 8? Events on files registered for epoll
notification are reported by descriptor, so the set membership has to be
associated (as reflected into userspace) with the descriptor, not the file.

For example, consider:

1) Process creates an epoll set, the set gets fd 4.

2) Process creates a socket, it gets fd 5.

3) The process adds fd 5 to set 4.

4) The process forks.

5) The child inherits the epoll set but not the socket.

Here the kernel cannot quite do the right thing. Ideally, the parent would
still have fd 5 in its version of the epoll set. After all, it has not
closed fd 5. However, the child *cannot* see fd 5 in its version of the
epoll set since it has no fd 5. An event reported for fd 5 would be
nonsense.

So it seems the kernel either has to break one of these "would/cannot"
requirements, or it has to split the epoll set in two. However, splitting
the set into two sets is clearly wrong since the processes should share it.

Q6 Will the close of an fd cause it to be removed from
all
epoll sets automatically?

A6 Yes.

Note that this talks of the close of an "fd", not a file. The 'close'
function in fact closes an fd, as that fd is then reusable. So it sounds
like the problem above is solved by removing the fd from the set, but in
practice this doesn't happen. I have programs that call 'close' between
'fork' and 'exec' and do not see the socket removed from the poll set.

DS


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/