Re: unexpected extra pollout events from epoll

From: Davide Libenzi
Date: Sun Oct 26 2008 - 18:07:39 EST


On Sun, 26 Oct 2008, Paul P wrote:

> I am programming a server using the epoll interface and have the receive portion of the server working fine, but for some reason as I implement the send portion, I noticed a few things that seem like strange behaviors in the implementation of epoll in the kernel.
>
> I'm running Opensuse 11 and it has a 2.6.25 kernel.
>
> The behavior that I can seeing is when I do a full read on an edge
> triggered fd, for some reason, it seems to be triggering an epollout
> event after each loop of the read events on a socket. (before I've done
> any writes at all to the socket)
>
> This is very strange behavior as I would expect that the epollout event
> would only be triggered if I did a write and the socket recieved an ack
> which cleared out the send buffer.
>
> The documentation on epollout is really sparse, so any help at all from
> the list would be very much appreciated. Do I need to manually arm the
> epollout flag after a write? I thought this was only necessary for
> level triggered epoll.

The way epoll works, is by hooking into the existing kernel poll
subsystem. It hooks into the poll wakeups, via callback, and it that way
it knows that "something" is changed. Then it reads the status of a file
via f_op->poll() to know the status.
What happens is that, if you listen for EPOLLIN|EPOLLOUT, when a packet
arrives the callback hook is hit, and the file is put into a maybe-ready
list. Maybe-ready because at the time of the callback, epoll has no clue
of what happened.
After that, via epoll_wait(), f_op->poll() is called to get the status of
the file, and since POLLIN|POLLOUT is returned (and since you're listening
for EPOLLIN|EPOLLOUT), that gets reported back to you.
The POLLOUT event, by meaning a buffer-full->buffer-avail transition, did
not really happen, but since POLLOUT is true, that gets reported back too.
This, again, since epoll has no clue of what happened at callback hit time.
I'm working on changes that will make epoll aware (by using the existing
support for the "key" parameter of wakeups) of events at callback time,
but this is something that is still up for discussion and definitely won't
be in .28.
The best way to do it ATM, is to wait for POLLOUT only when really needed.




- Davide


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/