Re: Random file I/O regressions in 2.6

From: Andrew Morton
Date: Mon May 10 2004 - 15:22:03 EST

Ram Pai <linuxram@xxxxxxxxxx> wrote:
> > Ram, can you take a look at fixing that up please? Something clean, not
> > more hacks ;) I'd also be interested in an explanation of what the extra
> > page is for. The little comment in there doesn't really help.
> The reason for the extra page read is as follows:
> Consider 16k random reads i/os. Reads are generated 4pages at a time.
> the readahead is triggered when the 4th page in the 'current-window' is
> touched.

Right. We've added two whole unsigned longs to the file_struct to track
the access patterns. That should be sufficient for us to detect when the
access pattern is random, and to then not perform readahead due to a
current-window miss *at all*.

So that extra page can go away, and:

--- 25/mm/readahead.c~a Mon May 10 13:16:59 2004
+++ 25-akpm/mm/readahead.c Mon May 10 13:17:22 2004
@@ -492,21 +492,17 @@ do_io:
if (ra->ahead_start == 0) {
- * if the average io-size is less than maximum
+ * If the average io-size is less than maximum
* readahead size of the file the io pattern is
* sequential. Hence bring in the readahead window
* immediately.
- * Else the i/o pattern is random. Bring
- * in the readahead window only if the last page of
- * the current window is accessed (lazy readahead).
unsigned long average = ra->average;

if (ra->serial_cnt > average)
average = (ra->serial_cnt + ra->average) / 2;

- if ((average >= max) || (offset == (ra->start +
- ra->size - 1))) {
+ if (average >= max) {
ra->ahead_start = ra->start + ra->size;
ra->ahead_size = ra->next_size;
actual = do_page_cache_readahead(mapping, filp,


That way, we read the correct amount of data, and we only start I/O when we
know the application is going to actually use the data.

This may cause problems when the application transitions from seeky-access
to linear-access.

Does it sound feasible?

> Probably we may see marginal degradation of this optimization with 16k
> i/o but the amount of wastage avoided by this optimization (hack)
> is great when random i/o is of larger size. I think it was 4% better
> performance on DSS workload with 64k random reads.

64k sounds unusually large. We need top performance at 8k too.

> Do you still think its a hack?

yup ;)

> Also I think with sysbench workload and Andrew's ra-copy patch, we
> might be loosing some benefits of some of the optimization because
> if two threads simulteously work with copies of the same ra structure
> and update it, the optimization effect reflected in one of the
> ra-structure is lost depending on which ra structure gets copied back
> last.

hm, maybe. That only makes a difference if two threads are accessing the
same fd at the same time, and it was really bad before the patch. The IO
patterns seemed OK to me with the patch. Except it's reading one page too

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at