[RFC][PATCH 0/8] epoll: remove epmutex from ep_free() and eventpoll_release_file() for non-nested case

From: Hou Tao
Date: Sat Oct 28 2017 - 08:53:28 EST


Hi,

We are optimizing the Request-Per-Second of nginx http server, and we found
that acquiring epmutex in eventpoll_release_file() will become a bottleneck
under the one-request-per-connection scenario. The following are some details
of the scenario:

* HTTP server (nginx):
* under ARM64 with 64 cores
* 64 worker processes, each worker is binded to a specific CPU
* keepalive_requests = 1 in nginx.conf: nginx will close the
connection fd after a reply is send
* HTTP benchmark tool (wrk):
* under x86-64 with 48 cores
* 16 threads, 64 connections per-thread

Before the patch, the RPS measured by wrk is ~220K, after applying
the patch the RPS is ~240K. We also measure the overhead of
eventpoll_release_file() and its children by perf: 29% before and
2% after.

In the following section I will explain the purposes of epmutex, and
the way of replacing it by using locks with a smaller granularity.

epmutex serves four purposes:
(1) serialize ep_loop_check() and ep_free()/eventpoll_release_file()
(a) ensure the validity of ep when clearing visited_list
The acquisition of epmutex in ep_free() prevent the freeing of ep.

It's fixed in patch 2: when freeing ep, remove it from visited_list.
When there is no nested-epoll cast, ep will not been added to
visited_list, so we check the condition first. If it has already been
added to visited_list, we need to wait for the release of epmutex.

(2) serialize reverse_path_check() and ep_free()/eventpoll_release_file()
(a) ensure the validity of file in tfile_check_list
epi->ffd.file was added to tfile_check_list under ep->mtx, but
was accessed without ep->mtx. The acquisition of epmutex in
eventpoll_release_file() prevent the freeing of file.

It's fixed in patch 3: when releasing file, remove it from
tfile_check_list. If it has been already added, we need to
wait for the release of epmutex.

(b) ensure the validity of epi->ep and epi->ep->file
The epmutex will prevent the freeing of ep and its related file,
so it's OK to access epi->ep under rcu read critical region.

The change is done in patch 4: we free ep by rcu, so it's OK
to access epi->ep->file under rcu read critical region. The file
has already been freed by rcu, so it's also OK to access its fields.

(3) serialize the concurrent invocations of epoll_ctl(EPOLL_CTL_ADD)
for the nested-epoll-fd case
(a) protect tfile_check_list and visited_list

There is nothing to do.

(4) serialize ep_free() and eventpoll_release_file()
(a) protect file->f_ep_links
eventpoll_release_file() will read the list through
file->f_ep_links, and modify it through epi->fllink.
ep_free() will modify it through epi->fllink.

It's fixed in patch 5: using rcu and list_first_or_null_rcu() to
iterate file->f_ep_links instead of epmutex.

(b) ensure the validity of epi->ep
When eventpoll_release_file() gets epi from file->f_ep_links,
epi->ep should still be valid.

It's fixed in patch 4 and 6: add an ref-counter to eventpoll and
free eventpoll by rcu.

(c) protect the removal of epi
Both ep_free() and eventpoll_release_file() will try to remove
the same epi, if one function has removed the epi, the other
function should not remove it again.

It's fixed in patch 7: check whether or not ep_free() has already
removed the epi before the invocation of ep_remove() in
eventpoll_release_file().

(d) ensure the validity of epi->ffd.file
When ep_remove() is invoked by ep_free(), epi->ffd.file should
still be valid.

Do not need to do anything: when ep_free() is invoking ep_remove()
and access epi->ffd.file, if the file is freeing, the freeing will
be blocked on ep->mtx, so it's OK to access the file in ep_remove().

Patch 1 just removes epmutex from ep_free() and eventpoll_release_file(),
and patch 8 enlarge the protected region of ep->mtx to protect against
the iteration of ep->rbr.

The patch set has passed the epoll related test cases in LTP, and we are
planing to run some torture or performance test cases for nested-epoll
cases.

Comments and questions are welcome.

Regards,

Tao
---
Hou Tao (8):
epoll: remove epmutex from ep_free() & eventpoll_release_file()
epoll: remove ep from visited_list when freeing ep
epoll: remove file from tfile_check_list when releasing file
epoll: free eventpoll by rcu to provide existence guarantee
epoll: iterate epi in file->f_ep_links by using list_first_or_null_rcu
epoll: ensure the validity of ep when removing epi in
eventpoll_release_file()
epoll: prevent the double-free of epi in eventpoll_release_file()
epoll: protect the iteration of ep->rbr by ep->mtx in ep_free()

fs/eventpoll.c | 102 +++++++++++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 88 insertions(+), 14 deletions(-)

--
2.7.5