Re: epoll design problems with common fork/exec patterns

From: Michael Kerrisk
Date: Thu Feb 28 2008 - 08:23:47 EST


On Thu, Feb 28, 2008 at 2:12 PM, Michael Kerrisk
<mtk.manpages@xxxxxxxxxxxxxx> wrote:
>
> On Wed, Feb 27, 2008 at 8:35 PM, Davide Libenzi <davidel@xxxxxxxxxxxxxxx> wrote:
> > On Tue, 26 Feb 2008, Chris "ã~B¯" Heath wrote:
> >
> > > On Tue, 2008-02-26 at 10:51 -0800, Davide Libenzi wrote:
> > > >
> >
> > > > Yes, you can't add the same fd twice. Think about a DB where "file*,fd" is
> > > > the key.
> > >
> > > To clarify, the key appears to be file* plus the user-space integer that
> > > represents the fd.
> >
> > Yes, that's what I said.
> >
> > > > > c) It is possible to add duplicated file descriptors referring to the same
> > > > > underlying open file description ("file *"). As you note, this can be a
> > > > > useful filtering technique, if the two file descriptors specify different
> > > > > masks.
> > > > >
> > > > > Assuming that is all correct, for man-pages-2.79, I've reworked the text
> > > > > for Q1/A1 as follows:
> > > > >
> > > > > Q1 What happens if you add the same file descriptor
> > > > > to an epoll set twice?
> > > > >
> > > > > A1 You will probably get EEXIST. However, it is pos-
> > > > > sible to add a duplicate (dup(2), dup2(2),
> > > > > fcntl(2) F_DUPFD, fork(2)) descriptor to the same
> > > > > epoll set. This can be a useful technique for
> > > > > filtering events, if the duplicate file descrip-
> > > > > tors are registered with different events masks.
> > > > >
> > > > > Seem okay Davide?
> > > >
> > > > Looks sane to me.
> > >
> > > I think fork(2) should not be in the above list. fork(2) duplicates the
> > > kernel's fd, but the user-space integer that represents the fd remains
> > > the same, so you will get EEXIST if you try to add the fd that was
> > > duplicated by fork.
> >
> > Good catch, fork(2) should not be there.
>
> Okay -- removed.
>
> But it is an ugly inconsistency. On the one hand, a child process
> cannot add the duplicate file descriptor to the epoll set. (In every
> other case that I can think of , descriptors duplicated by fork have
> similar semantics to descriptors duplicated by dup() and friends.) On
> the other hand, the very fact that the child has a duplicate of the
> descriptor means that even if the parent closes its descriptor, then
> epoll_wait() in the parent will continue to receive notifications for
> that descriptor because of the duplicated descriptor in the child.
>
> The choice of [file *, fd] as the key for epoll sets really does seem
> unfortunate. Keying on [pid, fd] would have given saner semantics, it
> seems to me. Obviously it can't be changed now though.


Davide,

with the earlier discussion in this thread in mind, I added a Q0/A0 to
epoll.7, just make the point about keys clear:


Q0 What is the key used to distinguish the file descrip-
tors in an epoll set?

A0 The key is the combination of the file descriptor
number and the open file description (also known as
"open file handle", the kernel's internal representa-
tion of an open file).

Does that seem okay?

Cheers,

Michael

--
Michael Kerrisk
Maintainer of the Linux man-pages project
http://www.kernel.org/doc/man-pages/
Want to report a man-pages bug? Look here:
http://www.kernel.org/doc/man-pages/reporting_bugs.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/