readahead revisited (PATCH)

From: Neil Brown (neilb@cse.unsw.edu.au)
Date: Mon Aug 21 2000 - 20:18:45 EST

Next message: George Anzinger: "Re: How to change the time-slice value ?"
Previous message: Michael Peddemors: "Re: INITRD/RAMDISK/SYSLINUX booting fails with new kernel (vs 2.2series)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi,
  I have been digging through the read-ahead code in mm/filemap.c and
  elsewhere and have noticed a few anomalies, some of which are
  addressed by the patch below.

1/ generic_file_readahead tries to do two sorts of readaheads -
    "synchronous" and "asynchronous". As far as I can tell,
    synchronous read-ahead happens when reading a page that we really
    want. The read request is made larger to include some readahead.
    Asynchronous read-ahead happens when the page we are after has
    already been read, but we think it might be time to read a bit
    more.

    Currently, generic_file_readahead never gets to do any
    asynchronous readahead. This is because it is never called when
    the page that we want is already in memory. It is only called
    when waiting for the "current" page to be read.

    This can be fixed by inserting a call to generic_file_readahead
    just before the "page_ok" label in generic_file_read as in the
    patch below. This makes the code more consistant with the code in
    2.2 which makes a similar call at a similar time.

2/ generic_file_readahead sometimes does extra synchronous
    readaheads that aren't necessary. This should not cause extra disc
    io, but will needlessly check a bunch of pages and find that they
    are already in the cache.
    The scenario goes like this, and requires the first patch:
    When reading page X, a synchronous read request for pages X to
    X+N-1 is made. Due to fragmentation (or some other reason) page
    X and X+1 are successful read, and the remaining pages are still
    on the request queue. generic_file_readahead processes X and X+1
    and starts an asynchronous read-head for pages X+N to X+2N-1.
    and then notices that X+2 isn't ready yet and calls
    generic_file_readahead while it waits. Due to the way the tests
    work in generic_file_readahead, it concludes that it is time for a
    synchronous readahead for pages X+2 to X+N+1 (or there abouts),
    and tries to read all those pages (only to find that they are
    already being read).

    Changing the test under "if (PageLocked(page))" to consider the
    full readahead window (f_rawin) instead of just the size of the
    last readahead request made (f_ralen) fixes this. See the patch.

Neither of these have substantial performance impact that I can
measure, but that may be because modern disc drives do a lot of
readahead themselves. Possibly it would be more noticeable on a
fragmented file.

Another issue that I have not addressed in the patch is that the
value "READA" as passed to ll_rw_block seems of little significance,
and probably should be done away with out of cleanliness.
Passing "READA" is the same as passing READ, except that it won't
block if there isn't a spare request structure to store the request
it.

This isn't used by readahead on files (as there is no difference
between the read requests that generic_file_readahead and
generic_file_read make), and isn't used on raw reading from block
devices, as block_dev.c(block_read) always uses "READ" even for
read-ahead blocks. It is use by ext2/dir.c when doing readahead on
directories, and by a couple of other filesystems which use breada to
read in directories or other metadata. This doesn't feel like enough
justification to keep the concept around. Maybe it is something that
could be cleaned up in 2.5???

NeilBrown

--- ./mm/filemap.c 2000/08/22 00:15:34 1.1
+++ ./mm/filemap.c 2000/08/22 00:25:09
@@ -899,7 +899,7 @@
  * page only.
  */
         if (PageLocked(page)) {
- if (!filp->f_ralen || index >= raend || index + filp->f_ralen < raend) {
+ if (!filp->f_ralen || index >= raend || index + filp->f_rawin < raend) {
                         raend = index;
                         if (raend < end_index)
                                 max_ahead = filp->f_ramax;
@@ -1076,6 +1076,7 @@

                 if (!Page_Uptodate(page))
                         goto page_not_up_to_date;
+ generic_file_readahead(reada_ok, filp, inode, page);
page_ok:
                 /*
                  * Ok, we have the page, and it's up-to-date, so
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

Next message: George Anzinger: "Re: How to change the time-slice value ?"
Previous message: Michael Peddemors: "Re: INITRD/RAMDISK/SYSLINUX booting fails with new kernel (vs 2.2series)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Wed Aug 23 2000 - 21:00:06 EST