Strange issues with epoll since 5.0

From: Omar Kilani
Date: Mon Apr 15 2019 - 14:02:56 EST


Hi there,

Iâm still trying to piece together a reproducible test that triggers
this, but I wanted to post in case someone goes âhmmm... change X
might have done thisâ.

Basically, somethingâs broken (or at least, has changed enough to
cause problems in user space) in epoll since 5.0. Itâs still broken in
5.1-rc5.

It doesnât happen 100% of the time. Itâs sort of hard to pin down but
Iâve observed the following:

* nginx not accepting connections under load
* A java app which uses netty / NIO having strange writability
semantics on channels, which confuses netty / java enough to not
properly flush written data on the socket.

I went and tested these Linux kernels:

4.20.17
4.19.32
4.14.111

And the issue(s) do not show up there.

Iâm still actively chasing this up, and will report back â I havenât
touched kernel code in 15 years so Iâm a little rusty. :)

Regards,
Omar