Re: [PATCH 5/9] readahead: on-demand readahead logic

From: Fengguang Wu
Date: Tue Jun 12 2007 - 06:35:35 EST


Hi Rusty,

On Tue, Jun 12, 2007 at 02:36:26PM +1000, Rusty Russell wrote:
> On Thu, 2007-05-17 at 06:47 +0800, Fengguang Wu wrote:
> > +static unsigned long
> > +ondemand_readahead(struct address_space *mapping,
> > + struct file_ra_state *ra, struct file *filp,
> > + struct page *page, pgoff_t offset,
> > + unsigned long req_size)
> > +{
> > + unsigned long max; /* max readahead pages */
> > + pgoff_t ra_index; /* readahead index */
> > + unsigned long ra_size; /* readahead size */
> > + unsigned long la_size; /* lookahead size */
> > + int sequential;
> > +
> > + max = ra->ra_pages;
> > + sequential = (offset - ra->prev_index <= 1UL) || (req_size > max);
>
> This <= 1UL seems weird. prev_index is end of last request, so I'd
> expect offset == prev_index + 1 for sequential reads? Does offset ==
> ra->prev_index happen? If not, this would be clearer as (offset ==
> ra->prev_index + 1).

It's possible to have (offset == ra->prev_index) when someone is doing
1K reads or 10K reads, which do not always align to page boundaries.

> (prev_index is not a great name either, but that's not your patch 8).

It was just renamed from `prev_page', hehe.

> > + /*
> > + * Lookahead/readahead hit, assume sequential access.
> > + * Ramp up sizes, and push forward the readahead window.
> > + */
> > + if (offset && (offset == ra->lookahead_index ||
> > + offset == ra->readahead_index)) {
> > + ra_index = ra->readahead_index;
> > + ra_size = get_next_ra_size2(ra, max);
> > + la_size = ra_size;
> > + goto fill_ra;
> > + }
>
> Will offset hit lookahead_index or readahead_index exactly? Should this
> be checking the range from offset to offset + req_size?

Yes, normally lookahead_index will be hit(1). But in case readahead is
canceled because of IO congestion at that time, readahead_index will
be hit later(2).

The readahead code is called on two possible conditions:
(1) page != NULL and PageReadahead(page)
It will be an asynchronous readahead.
In this case, (offset == ra->lookahead_index) indicates sequential
reads that have been associated with a valid readahead window.

(2) page == NULL
It will be a synchronous readahead.
In this case, (offset == ra->readahead_index) indicates sequential
reads that has just consumed all of the readahead pages.

> > + ra_index = offset;
> > + ra_size = get_init_ra_size(req_size, max);
> > + la_size = ra_size > req_size ? ra_size - req_size : ra_size;
>
> So if we're doing a big sequential read, ra_size < req_size, so next
> time offset will be > ra->readahead_index and the "ramp up sizes" code
> won't get run?

For big reads, (ra_size = max) and (max < req_size), la_size will be
equal to ra_size, or max. So after this readahead invocation submits
max pages of I/O and returns to do_generic_mapping_read(), it will
*immediately* be called again because of lookahead hit:

if (!page) {
page_cache_readahead_ondemand(mapping,
&ra, filp, page,
index, last_index - index);
page = find_get_page(mapping, index);
if (unlikely(page == NULL))
goto no_cached_page;
}
lookahead hit: if (PageReadahead(page)) {
page_cache_readahead_ondemand(mapping,
&ra, filp, page,
index, last_index - index);
}

Then it will submit another max pages of readahead I/O, whether or not
the size ramp up code will be executed: either the remaining request
size is still > max and get_init_ra_size() is called, or the remaining
request size is <= max and get_next_ra_size() is called, in both cases
they will return max. This behavior is inherited from the current
readahead, and makes sense.

> > + /*
> > + * Hit on a lookahead page without valid readahead state.
> > + * E.g. interleaved reads.
> > + * Not knowing its readahead pos/size, bet on the minimal possible one.
> > + */
> > + if (page) {
> > + ra_index++;
> > + ra_size = min(4 * ra_size, max);
> > + }
>
> If I understand correctly, it's expected to happen when we have multiple
> streams: we previously marked the lookahead page, but then the other
> stream changed the ra to somewhere else in the file. We now change it
> back to our stream, but we've lost information so we make it up.

Yeah, exactly!

> This seems a little like two functions crammed into one. Do you think
> page_cache_readahead_ondemand() should be split into
> "page_cache_readahead()" which doesn't take a page*, and
> "page_cache_check_readahead_page()" which is an inline which does the
> PageReadahead(page) check as well?

page_cache_check_readahead_page(..., page) is a good idea.
But which part of the code should we put to page_cache_readahead()
that does not take a page param?

Thank you,
Fengguang

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/