On 9/2/19 11:36 AM, Roman Penyaev wrote:
Hi,
This is indeed a bug. (quick side note: could you please remove efd[1]
from your test, because it is not related to the reproduction of a
current bug).
Your patch lacks a good description, what exactly you've fixed. Let
me speak out loud and please correct me if I'm wrong, my understanding
of epoll internals has become a bit rusty: when epoll fds are nested
an attempt to harvest events (ep_scan_ready_list() call) produces a
second (repeated) event from an internal fd up to an external fd:
ÂÂÂÂ epoll_wait(efd[0], ...):
ÂÂÂÂÂÂ ep_send_events():
ÂÂÂÂÂÂÂÂÂ ep_scan_ready_list(depth=0):
ÂÂÂÂÂÂÂÂÂÂÂ ep_send_events_proc():
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ ep_item_poll():
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ ep_scan_ready_list(depth=1):
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ ep_poll_safewake():
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ ep_poll_callback()
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ list_add_tail(&epi, &epi->rdllist);
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ ^^^^^^
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ repeated event
In your patch you forbid wakeup for the cases, where depth != 0, i.e.
for all nested cases. That seems clear. But what if we can go further
and remove the whole chunk, which seems excessive:
@@ -885,26 +886,11 @@ static __poll_t ep_scan_ready_list(struct
eventpoll *ep,
-
-ÂÂÂÂÂÂ if (!list_empty(&ep->rdllist)) {
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ /*
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * Wake up (if active) both the eventpoll wait list and
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * the ->poll() wait list (delayed after we release the
lock).
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ */
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ if (waitqueue_active(&ep->wq))
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ wake_up(&ep->wq);
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ if (waitqueue_active(&ep->poll_wait))
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ pwake++;
-ÂÂÂÂÂÂ }
ÂÂÂÂÂÂÂ write_unlock_irq(&ep->lock);
ÂÂÂÂÂÂÂ if (!ep_locked)
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ mutex_unlock(&ep->mtx);
-ÂÂÂÂÂÂ /* We have to call this outside the lock */
-ÂÂÂÂÂÂ if (pwake)
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ ep_poll_safewake(&ep->poll_wait);
I reason like that: by the time we've reached the point of scanning events
for readiness all wakeups from ep_poll_callback have been already fired and
new events have been already accounted in ready list (ep_poll_callback()
calls
the same ep_poll_safewake()). Here, frankly, I'm not 100% sure and probably
missing some corner cases.
Thoughts?
So the: 'wake_up(&ep->wq);' part, I think is about waking up other
threads that may be in waiting in epoll_wait(). For example, there may
be multiple threads doing epoll_wait() on the same epoll fd, and the
logic above seems to say thread 1 may have processed say N events and
now its going to to go off to work those, so let's wake up thread 2 now
to handle the next chunk.
So I think removing all that even for the
depth 0 case is going to change some behavior here. So perhaps, it
should be removed for all depths except for 0? And if so, it may be
better to make 2 patches here to separate these changes.
For the nested wakeups, I agree that the extra wakeups seem unnecessary
and it may make sense to remove them for all depths. I don't think the
nested epoll semantics are particularly well spelled out, and afaict,
nested epoll() has behaved this way for quite some time. And the current
behavior is not bad in the way that a missing wakeup or false negative
would be.