Re: read()/readv() only from page cache

From: Mel Gorman
Date: Fri Sep 05 2014 - 07:09:35 EST


On Thu, Jul 24, 2014 at 10:36:33PM -0400, Milosz Tanski wrote:
> After spending some time of my own fighting similar problems I figured
> I'd reach out to see if there's something that can be done that can
> make my use case easier. I was wondering if there is a read family
> syscall that allows me to read from a file descriptor only if the data
> is in the page cache (or only the portion of the data is in the page
> cache).
>

I suggest you look at the recent fincore debate. It did not progress much
the last time because the author wanted to push a lot of functionality in
there where as reviewers felt it should start simple. The simple case is
likely a good fit for what you want. The primary downside is that it would
be race-prone in memory pressure situations as the page could be reclaimed
between the fincore check and the read but I expect that your application
is already avoiding reclaim activity.

Depending on your application, fincore is far cheaper than mincore because
mincore requires the file be mapped first which in a threaded application
will crucify performance if called regularly.

Technically nothing would prevent the implementation of an fcntl operation
that returned failure from read() when the page is not in the page
cache. However, the use-case is so specific and Linux-specific that it
would encounter resistance being merged. The likely feedback would be to
implement fincore or explain in detail why fincore is not sufficient which
would be a tough argument to win. You'll get beaten with the "interfaces
are forever and your use case is too specific" stick.

The argument that fincore is an extra syscall is not likely to get much
traction as it'll be pointed out that you are already incurring IPC and
synchronisation overhead. Relative to that, the cost of fincore should
be negligible.

--
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/