Re: [PATCH 2/2] epoll: atomically remove wait entry on wake up

From: Jason Baron
Date: Fri May 01 2020 - 16:06:19 EST




On 4/30/20 9:03 AM, Roman Penyaev wrote:
> This patch does two things:
>
> 1. fixes lost wakeup introduced by:
> 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll")
>
> 2. improves performance for events delivery.
>
> The description of the problem is the following: if N (>1) threads
> are waiting on ep->wq for new events and M (>1) events come, it is
> quite likely that >1 wakeups hit the same wait queue entry, because
> there is quite a big window between __add_wait_queue_exclusive() and
> the following __remove_wait_queue() calls in ep_poll() function. This
> can lead to lost wakeups, because thread, which was woken up, can
> handle not all the events in ->rdllist. (in better words the problem
> is described here: https://lkml.org/lkml/2019/10/7/905)
>
> The idea of the current patch is to use init_wait() instead of
> init_waitqueue_entry(). Internally init_wait() sets
> autoremove_wake_function as a callback, which removes the wait entry
> atomically (under the wq locks) from the list, thus the next coming
> wakeup hits the next wait entry in the wait queue, thus preventing
> lost wakeups.
>
> Problem is very well reproduced by the epoll60 test case [1].
>
> Wait entry removal on wakeup has also performance benefits, because
> there is no need to take a ep->lock and remove wait entry from the
> queue after the successful wakeup. Here is the timing output of
> the epoll60 test case:
>
> With explicit wakeup from ep_scan_ready_list() (the state of the
> code prior 339ddb53d373):
>
> real 0m6.970s
> user 0m49.786s
> sys 0m0.113s
>
> After this patch:
>
> real 0m5.220s
> user 0m36.879s
> sys 0m0.019s
>
> The other testcase is the stress-epoll [2], where one thread consumes
> all the events and other threads produce many events:
>
> With explicit wakeup from ep_scan_ready_list() (the state of the
> code prior 339ddb53d373):
>
> threads events/ms run-time ms
> 8 5427 1474
> 16 6163 2596
> 32 6824 4689
> 64 7060 9064
> 128 6991 18309
>
> After this patch:
>
> threads events/ms run-time ms
> 8 5598 1429
> 16 7073 2262
> 32 7502 4265
> 64 7640 8376
> 128 7634 16767
>
> (number of "events/ms" represents event bandwidth, thus higher is
> better; number of "run-time ms" represents overall time spent
> doing the benchmark, thus lower is better)
>
> [1] tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c
> [2] https://github.com/rouming/test-tools/blob/master/stress-epoll.c
>
> Signed-off-by: Roman Penyaev <rpenyaev@xxxxxxx>
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Cc: Khazhismel Kumykov <khazhy@xxxxxxxxxx>
> Cc: Alexander Viro <viro@xxxxxxxxxxxxxxxxxx>
> Cc: Heiher <r@xxxxxx>
> Cc: Jason Baron <jbaron@xxxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx
> Cc: linux-fsdevel@xxxxxxxxxxxxxxx
> Cc: linux-kernel@xxxxxxxxxxxxxxx
> ---
> fs/eventpoll.c | 43 ++++++++++++++++++++++++-------------------
> 1 file changed, 24 insertions(+), 19 deletions(-)
>

Looks good to me and nice speedups.
Reviewed-by: Jason Baron <jbaron@xxxxxxxxxx>

Thanks,

-Jason