Re: [PATCH 7/22] Upfront readahead to help streaming AIO reads

From: Suparna Bhattacharya
Date: Fri Jul 02 2004 - 08:14:09 EST


On Fri, Jul 02, 2004 at 06:30:30PM +0530, Suparna Bhattacharya wrote:
> The patchset contains modifications and fixes to the AIO core
> to support the full retry model, an implementation of AIO
> support for buffered filesystem AIO reads and O_SYNC writes
> (the latter courtesy O_SYNC speedup changes from Andrew Morton),
> an implementation of AIO reads and writes to pipes (from
> Chris Mason) and AIO poll (again from Chris Mason).
>
> Full retry infrastructure and fixes
> [1] aio-retry.patch
> [2] 4g4g-aio-hang-fix.patch
> [3] aio-retry-elevated-refcount.patch
> [4] aio-splice-runlist.patch
>
> FS AIO read
> [5] aio-wait-page.patch
> [6] aio-fs_read.patch
> [7] aio-upfront-readahead.patch

--
Suparna Bhattacharya (suparna@xxxxxxxxxx)
Linux Technology Center
IBM Software Lab, India

-----------------------------------------
From: Suparna Bhattacharya <suparna@xxxxxxxxxx>

This patch modifies do_generic_mapping_read to readahead upto ra_pages
pages in the range requested upfront for AIO reads before it starts
waiting for any of the pages to become uptodate.

This leads to sane readahead behaviour and I/O ordering for the kind
of I/O patterns generated by streaming AIO reads, by ensuring that
I/O for as many consecutive blocks as possible in the first request
is issued before before submission of the next request (notice that
unlike sync I/O, AIO can't wait for completion of the first request
before submitting the next).

The patch also takes care not to repeatedly issue readaheads for
subsequent AIO retries for the same request.

Upfront readahead is clipped to ra_pages (128K) to maintain pipelined
behaviour for very large requests, like sendfile of a large file.
The tradeoff is that in the cases where individual request sizes
exceed ra_pages (typically 128KB) I/O ordering wouldn't be optimal
for streaming AIOs.

There's a good reason why these changes are limited only to AIO.
For sendfile with O_NONBLOCK in a loop, the extra upfront readahead
getting issued on every iteration disturbs sequentiality of the
readahead pattern resulting in non-optimal behaviour (this showed
up as a regression in O_NONBLOCK sendfile for a large file). This
isn't likely to be a problem with AIO sendfile when it is implemented
because that wouldn't be likely to use O_NONBLOCK.


filemap.c | 37 ++++++++++++++++++++++++++++++++++++-
1 files changed, 36 insertions(+), 1 deletion(-)

--- aio/mm/filemap.c 2004-06-18 06:10:37.953164632 -0700
+++ aio-upfront-readahead/mm/filemap.c 2004-06-18 08:28:49.731622704 -0700
@@ -707,6 +707,34 @@ void do_generic_mapping_read(struct addr
index = *ppos >> PAGE_CACHE_SHIFT;
offset = *ppos & ~PAGE_CACHE_MASK;

+ if (unlikely(in_aio())) {
+ unsigned long i, last, nr;
+ /*
+ * Let the readahead logic know upfront about all
+ * the pages we'll need to satisfy this request while
+ * taking care to avoid repeat readaheads during retries.
+ * Required for reasonable IO ordering with multipage
+ * streaming AIO requests.
+ */
+ if ((!is_retried_kiocb(io_wait_to_kiocb(current->io_wait)))
+ || (ra.prev_page + 1 == index)) {
+
+ last = (*ppos + desc->count - 1) >> PAGE_CACHE_SHIFT;
+ nr = max_sane_readahead(last - index + 1);
+
+ for (i = 0; (i < nr) && ((i == 0)||(i < ra.ra_pages));
+ i++) {
+ page_cache_readahead(mapping, &ra, filp,
+ index + i);
+ if (bdi_read_congested(
+ mapping->backing_dev_info)) {
+ printk("AIO readahead congestion\n");
+ break;
+ }
+ }
+ }
+ }
+
for (;;) {
struct page *page;
unsigned long end_index, nr, ret;
@@ -724,8 +752,15 @@ void do_generic_mapping_read(struct addr
}

cond_resched();
- page_cache_readahead(mapping, &ra, filp, index);

+ /*
+ * Take care to avoid disturbing the existing readahead
+ * window (concurrent reads may be active for the same fd,
+ * in the AIO case)
+ */
+ if (!in_aio() || (ra.prev_page + 1 == index))
+ page_cache_readahead(mapping, &ra, filp, index);
+
nr = nr - offset;
find_page:
page = find_get_page(mapping, index);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/