aio poll and a new in-kernel poll API V10

From: Christoph Hellwig
Date: Fri May 11 2018 - 07:24:03 EST


Hi all,

this series adds support for the IOCB_CMD_POLL operation to poll for the
readyness of file descriptors using the aio subsystem. The API is based
on patches that existed in RHAS2.1 and RHEL3, which means it already is
supported by libaio. To implement the poll support efficiently new
methods to poll are introduced in struct file_operations: get_poll_head
and poll_mask. The first one returns a wait_queue_head to wait on
(lifetime is bound by the file), and the second does a non-blocking
check for the POLL* events. This allows aio poll to work without
any additional context switches, unlike epoll.

This series sits on top of the aio-fsync series that also includes
support for io_pgetevents.

The changes were sponsored by Scylladb, and improve performance
of the seastar framework up to 10%, while also removing the need
for a privileged SCHED_FIFO epoll listener thread.

git://git.infradead.org/users/hch/vfs.git aio-poll.10

Gitweb:

http://git.infradead.org/users/hch/vfs.git/shortlog/refs/heads/aio-poll.10

Libaio changes:

https://pagure.io/libaio.git io-poll

Seastar changes (not updated for the new io_pgetevens ABI yet):

https://github.com/avikivity/seastar/commits/aio


Changes since v9:
- add to the delayed_cancel_reqs earlier to avoid a race
- get rid of POLL_TO_PTR magic

Changes since v8:
- make delayed cancellation conditional again
- add a cancel_kiocb file operation to split delayed vs normal cancel

Changes since v7:
- make delayed cancellation safe and unconditional

Changes since v6:
- reworked cancellation

Changes since v5:
- small changelog updates
- rebased on top of the aio-fsync changes

Changes since v4:
- rebased ontop of Linux 4.16-rc4

Changes since v3:
- remove the pre-sleep ->poll_mask call in vfs_poll,
allow ->get_poll_head to return POLL* values.

Changes since v2:
- removed a double initialization
- new vfs_get_poll_head helper
- document that ->get_poll_head can return NULL
- call ->poll_mask before sleeping
- various ACKs
- add conversion of random to ->poll_mask
- add conversion of af_alg to ->poll_mask
- lacking ->poll_mask support now returns -EINVAL for IOCB_CMD_POLL
- reshuffled the series so that prep patches and everything not
requiring the new in-kernel poll API is in the beginning

Changes since v1:
- handle the NULL ->poll case in vfs_poll
- dropped the file argument to the ->poll_mask socket operation
- replace the ->pre_poll socket operation with ->get_poll_head as
in the file operations