Re: fadvise interferes with readahead

From: Jaegeuk Hanse
Date: Wed Nov 21 2012 - 02:51:02 EST

Next message: Andrew Morton: "Re: kmem accounting netperf data"
Previous message: Linus Walleij: "[PATCH] gpiolib: rename pin range arguments"
In reply to: Claudio Freire: "Re: fadvise interferes with readahead"
Next in thread: Fengguang Wu: "Re: fadvise interferes with readahead"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 11/20/2012 10:58 PM, Fengguang Wu wrote:

On Tue, Nov 20, 2012 at 10:34:11AM -0300, Claudio Freire wrote:
On Tue, Nov 20, 2012 at 5:04 AM, Fengguang Wu <fengguang.wu@xxxxxxxxx> wrote:
Yes. The kernel readahead code by design will outperform simple
fadvise in the case of clustered random reads. Imagine the access
pattern 1, 3, 2, 6, 4, 9. fadvise will trigger 6 IOs literally. While
kernel readahead will likely trigger 3 IOs for 1, 3, 2-9. Because on
the page miss for 2, it will detect the existence of history page 1
and do readahead properly. For hard disks, it's mainly the number of
IOs that matters. So even if kernel readahead loses some opportunities
to do async IO and possibly loads some extra pages that will never be
used, it still manges to perform much better.

The fix would lay in fadvise, I think. It should update readahead
tracking structures. Alternatively, one could try to do it in
do_generic_file_read, updating readahead on !PageUptodate or even on
page cache hits. I really don't have the expertise or time to go
modifying, building and testing the supposedly quite simple patch that
would fix this. It's mostly about the testing, in fact. So if someone
can comment or try by themselves, I guess it would really benefit
those relying on fadvise to fix this behavior.

One possible solution is to try the context readahead at fadvise time
to check the existence of history pages and do readahead accordingly.

However it will introduce *real interferences* between kernel
readahead and user prefetching. The original scheme is, once user
space starts its own informed prefetching, kernel readahead will
automatically stand out of the way.

I understand that would seem like a reasonable design, but in this
particular case it doesn't seem to be. I propose that in most cases it
doesn't really work well as a design decision, to make fadvise work as
direct I/O. Precisely because fadvise is supposed to be a hint to let
the kernel make better decisions, and not a request to make the kernel
stop making decisions.

Any interference so introduced wouldn't be any worse than the
interference introduced by readahead over reads. I agree, if fadvise
were to trigger readahead, it could be bad for applications that don't
read what they say the will.

Right.

But if cache hits were to simply update
readahead state, it would only mean that read calls behave the same
regardless of fadvise calls. I think that's worth pursuing.

Here you are describing an alternative solution that will somehow trap
into the readahead code even when, for example, the application is
accessing once and again an already cached file? I'm afraid this will
add non-trivial overheads and is less attractive than the "readahead
on fadvise" solution.

Hi Fengguang,

Page cache sync readahead only triggered when cache miss, but if file has already cached, how can readahead be trigged again if the application is accessing once and again an already cached file.

Regards,
Jaegeuk

I ought to try to prepare a patch for this to illustrate my point. Not
sure I'll be able to though.

I'd be glad to materialize the readahead on fadvise proposal, if there
are no obvious negative examples/cases.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Andrew Morton: "Re: kmem accounting netperf data"
Previous message: Linus Walleij: "[PATCH] gpiolib: rename pin range arguments"
In reply to: Claudio Freire: "Re: fadvise interferes with readahead"
Next in thread: Fengguang Wu: "Re: fadvise interferes with readahead"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]