[GIT PULL] io_uring changes for 5.9-rc1

From: Jens Axboe
Date: Sun Aug 02 2020 - 17:41:29 EST


Hi Linus,

Lots of cleanups in here, hardening the code and/or making it easier to
read and fixing buts, but a core feature/change too adding support for
real async buffered reads. With the latter in place, we just need
buffered write async support and we're done relying on kthreads for the
fast path. In detail:

- Cleanup how memory accounting is done on ring setup/free (Bijan)

- sq array offset calculation fixup (Dmitry)

- Consistently handle blocking off O_DIRECT submission path (me)

- Support proper async buffered reads, instead of relying on kthread
offload for that. This uses the page waitqueue to drive retries from
task_work, like we handle poll based retry. (me)

- IO completion optimizations (me)

- Fix race with accounting and ring fd install (me)

- Support EPOLLEXCLUSIVE (Jiufei)

- Get rid of the io_kiocb unionizing, made possible by shrinking other
bits (Pavel)

- Completion side cleanups (Pavel)

- Cleanup REQ_F_ flags handling, and kill off many of them (Pavel)

- Request environment grabbing cleanups (Pavel)

- File and socket read/write cleanups (Pavel)

- Improve kiocb_set_rw_flags() (Pavel)

- Tons of fixes and cleanups (Pavel)

- IORING_SQ_NEED_WAKEUP clear fix (Xiaoguang)

This will throw a few merge conflicts. One is due to the IOCB_NOIO
addition that happened late in 5.8-rc, the other is due to a change in
for-5.9/block. Both are trivial to fixup, I'm attaching my merge
resolution when I pulled it in locally.

Please pull!


The following changes since commit 4ae6dbd683860b9edc254ea8acf5e04b5ae242e5:

io_uring: fix lockup in io_fail_links() (2020-07-24 12:51:33 -0600)

are available in the Git repository at:

git://git.kernel.dk/linux-block.git tags/for-5.9/io_uring-20200802

for you to fetch changes up to fa15bafb71fd7a4d6018dae87cfaf890fd4ab47f:

io_uring: flip if handling after io_setup_async_rw (2020-08-01 11:02:57 -0600)

----------------------------------------------------------------
for-5.9/io_uring-20200802

----------------------------------------------------------------
Bijan Mottahedeh (4):
io_uring: add wrappers for memory accounting
io_uring: rename ctx->account_mem field
io_uring: report pinned memory usage
io_uring: separate reporting of ring pages from registered pages

Dan Carpenter (1):
io_uring: fix a use after free in io_async_task_func()

Dmitry Vyukov (1):
io_uring: fix sq array offset calculation

Jens Axboe (31):
block: provide plug based way of signaling forced no-wait semantics
io_uring: always plug for any number of IOs
io_uring: catch -EIO from buffered issue request failure
io_uring: re-issue block requests that failed because of resources
mm: allow read-ahead with IOCB_NOWAIT set
mm: abstract out wake_page_match() from wake_page_function()
mm: add support for async page locking
mm: support async buffered reads in generic_file_buffered_read()
fs: add FMODE_BUF_RASYNC
block: flag block devices as supporting IOCB_WAITQ
xfs: flag files as supporting buffered async reads
btrfs: flag files as supporting buffered async reads
mm: add kiocb_wait_page_queue_init() helper
io_uring: support true async buffered reads, if file provides it
Merge branch 'async-buffered.8' into for-5.9/io_uring
io_uring: provide generic io_req_complete() helper
io_uring: add 'io_comp_state' to struct io_submit_state
io_uring: pass down completion state on the issue side
io_uring: pass in completion state to appropriate issue side handlers
io_uring: enable READ/WRITE to use deferred completions
io_uring: use task_work for links if possible
Merge branch 'io_uring-5.8' into for-5.9/io_uring
io_uring: clean up io_kill_linked_timeout() locking
Merge branch 'io_uring-5.8' into for-5.9/io_uring
io_uring: abstract out task work running
io_uring: use new io_req_task_work_add() helper throughout
io_uring: only call kfree() for a non-zero pointer
io_uring: get rid of __req_need_defer()
io_uring: remove dead 'ctx' argument and move forward declaration
Merge branch 'io_uring-5.8' into for-5.9/io_uring
io_uring: don't touch 'ctx' after installing file descriptor

Jiufei Xue (2):
io_uring: change the poll type to be 32-bits
io_uring: use EPOLLEXCLUSIVE flag to aoid thundering herd type behavior

Pavel Begunkov (90):
io_uring: remove setting REQ_F_MUST_PUNT in rw
io_uring: remove REQ_F_MUST_PUNT
io_uring: set @poll->file after @poll init
io_uring: kill NULL checks for submit state
io_uring: fix NULL-mm for linked reqs
io-wq: compact io-wq flags numbers
io-wq: return next work from ->do_work() directly
io_uring: fix req->work corruption
io_uring: fix punting req w/o grabbed env
io_uring: fix feeding io-wq with uninit reqs
io_uring: don't mark link's head for_async
io_uring: fix missing io_grab_files()
io_uring: fix refs underflow in io_iopoll_queue()
io_uring: remove inflight batching in free_many()
io_uring: dismantle req early and remove need_iter
io_uring: batch-free linked requests as well
io_uring: cosmetic changes for batch free
io_uring: kill REQ_F_LINK_NEXT
io_uring: clean up req->result setting by rw
io_uring: do task_work_run() during iopoll
io_uring: fix iopoll -EAGAIN handling
io_uring: fix missing wake_up io_rw_reissue()
io_uring: deduplicate freeing linked timeouts
io_uring: replace find_next() out param with ret
io_uring: kill REQ_F_TIMEOUT
io_uring: kill REQ_F_TIMEOUT_NOSEQ
io_uring: fix potential use after free on fallback request free
io_uring: don't pass def into io_req_work_grab_env
io_uring: do init work in grab_env()
io_uring: factor out grab_env() from defer_prep()
io_uring: do grab_env() just before punting
io_uring: don't fail iopoll requeue without ->mm
io_uring: fix NULL mm in io_poll_task_func()
io_uring: simplify io_async_task_func()
io_uring: optimise io_req_find_next() fast check
io_uring: fix missing ->mm on exit
io_uring: fix mis-refcounting linked timeouts
io_uring: keep queue_sqe()'s fail path separately
io_uring: fix lost cqe->flags
io_uring: don't delay iopoll'ed req completion
io_uring: fix stopping iopoll'ing too early
io_uring: briefly loose locks while reaping events
io_uring: partially inline io_iopoll_getevents()
io_uring: remove nr_events arg from iopoll_check()
io_uring: don't burn CPU for iopoll on exit
io_uring: rename sr->msg into umsg
io_uring: use more specific type in rcv/snd msg cp
io_uring: extract io_sendmsg_copy_hdr()
io_uring: replace rw->task_work with rq->task_work
io_uring: simplify io_req_map_rw()
io_uring: add a helper for async rw iovec prep
io_uring: follow **iovec idiom in io_import_iovec
io_uring: share completion list w/ per-op space
io_uring: rename ctx->poll into ctx->iopoll
io_uring: use inflight_entry list for iopoll'ing
io_uring: use completion list for CQ overflow
io_uring: add req->timeout.list
io_uring: remove init for unused list
io_uring: use non-intrusive list for defer
io_uring: remove sequence from io_kiocb
io_uring: place cflags into completion data
io_uring: inline io_req_work_grab_env()
io_uring: remove empty cleanup of OP_OPEN* reqs
io_uring: alloc ->io in io_req_defer_prep()
io_uring/io-wq: move RLIMIT_FSIZE to io-wq
io_uring: simplify file ref tracking in submission state
io_uring: indent left {send,recv}[msg]()
io_uring: remove extra checks in send/recv
io_uring: don't forget cflags in io_recv()
io_uring: free selected-bufs if error'ed
io_uring: move BUFFER_SELECT check into *recv[msg]
io_uring: extract io_put_kbuf() helper
io_uring: don't open-code recv kbuf managment
io_uring: don't miscount pinned memory
io_uring: return locked and pinned page accounting
tasks: add put_task_struct_many()
io_uring: batch put_task_struct()
io_uring: don't do opcode prep twice
io_uring: deduplicate io_grab_files() calls
io_uring: mark ->work uninitialised after cleanup
io_uring: fix missing io_queue_linked_timeout()
io-wq: update hash bits
io_uring: de-unionise io_kiocb
io_uring: deduplicate __io_complete_rw()
io_uring: fix racy overflow count reporting
io_uring: fix stalled deferred requests
io_uring: consolidate *_check_overflow accounting
io_uring: get rid of atomic FAA for cq_timeouts
fs: optimise kiocb_set_rw_flags()
io_uring: flip if handling after io_setup_async_rw

Randy Dunlap (1):
io_uring: fix function args for !CONFIG_NET

Xiaoguang Wang (1):
io_uring: clear IORING_SQ_NEED_WAKEUP after executing task works

block/blk-core.c | 6 +
fs/block_dev.c | 2 +-
fs/btrfs/file.c | 2 +-
fs/io-wq.c | 14 +-
fs/io-wq.h | 11 +-
fs/io_uring.c | 2588 +++++++++++++++++++++++------------------
fs/xfs/xfs_file.c | 2 +-
include/linux/blkdev.h | 1 +
include/linux/fs.h | 26 +-
include/linux/pagemap.h | 75 ++
include/linux/sched/task.h | 6 +
include/uapi/linux/io_uring.h | 4 +-
mm/filemap.c | 110 +-
tools/io_uring/liburing.h | 6 +-
14 files changed, 1658 insertions(+), 1195 deletions(-)

--
Jens Axboe



commit 32a5169a5562db6a09a2d85164e0079913ecc227
Merge: 5fb023fb414a fa15bafb71fd
Author: Jens Axboe <axboe@xxxxxxxxx>
Date: Sun Aug 2 10:43:35 2020 -0600

Merge branch 'for-5.9/io_uring' into test

* for-5.9/io_uring: (127 commits)
io_uring: flip if handling after io_setup_async_rw
fs: optimise kiocb_set_rw_flags()
io_uring: don't touch 'ctx' after installing file descriptor
io_uring: get rid of atomic FAA for cq_timeouts
io_uring: consolidate *_check_overflow accounting
io_uring: fix stalled deferred requests
io_uring: fix racy overflow count reporting
io_uring: deduplicate __io_complete_rw()
io_uring: de-unionise io_kiocb
io-wq: update hash bits
io_uring: fix missing io_queue_linked_timeout()
io_uring: mark ->work uninitialised after cleanup
io_uring: deduplicate io_grab_files() calls
io_uring: don't do opcode prep twice
io_uring: clear IORING_SQ_NEED_WAKEUP after executing task works
io_uring: batch put_task_struct()
tasks: add put_task_struct_many()
io_uring: return locked and pinned page accounting
io_uring: don't miscount pinned memory
io_uring: don't open-code recv kbuf managment
...

Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>

diff --cc block/blk-core.c
index 93104c7470e8,62a4904db921..d9d632639bd1
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@@ -956,13 -952,30 +956,18 @@@ static inline blk_status_t blk_check_zo
return BLK_STS_OK;
}

-static noinline_for_stack bool
-generic_make_request_checks(struct bio *bio)
+static noinline_for_stack bool submit_bio_checks(struct bio *bio)
{
- struct request_queue *q;
- int nr_sectors = bio_sectors(bio);
+ struct request_queue *q = bio->bi_disk->queue;
blk_status_t status = BLK_STS_IOERR;
+ struct blk_plug *plug;
- char b[BDEVNAME_SIZE];

might_sleep();

- q = bio->bi_disk->queue;
- if (unlikely(!q)) {
- printk(KERN_ERR
- "generic_make_request: Trying to access "
- "nonexistent block-device %s (%Lu)\n",
- bio_devname(bio, b), (long long)bio->bi_iter.bi_sector);
- goto end_io;
- }
-
+ plug = blk_mq_plug(q, bio);
+ if (plug && plug->nowait)
+ bio->bi_opf |= REQ_NOWAIT;
+
/*
* For a REQ_NOWAIT based request, return -EOPNOTSUPP
* if queue is not a request based queue.
diff --cc include/linux/fs.h
index 41cd993ec0f6,e535543d31d9..b7f1f1b7d691
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@@ -315,7 -318,8 +318,9 @@@ enum rw_hint
#define IOCB_SYNC (1 << 5)
#define IOCB_WRITE (1 << 6)
#define IOCB_NOWAIT (1 << 7)
+ /* iocb->ki_waitq is valid */
+ #define IOCB_WAITQ (1 << 8)
+#define IOCB_NOIO (1 << 9)

struct kiocb {
struct file *ki_filp;
diff --cc mm/filemap.c
index 385759c4ce4b,a5b1fa8f7ce4..4e39c1f4c7d9
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@@ -2028,8 -2044,6 +2044,8 @@@ find_page

page = find_get_page(mapping, index);
if (!page) {
- if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_NOIO))
++ if (iocb->ki_flags & IOCB_NOIO)
+ goto would_block;
page_cache_sync_readahead(mapping,
ra, filp,
index, last_index - index);
@@@ -2164,7 -2185,7 +2191,7 @@@ page_not_up_to_date_locked
}

readpage:
- if (iocb->ki_flags & IOCB_NOIO) {
- if (iocb->ki_flags & IOCB_NOWAIT) {
++ if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_NOIO)) {
unlock_page(page);
put_page(page);
goto would_block;