Re: [RFC] kernel facilities for cache prefetching

From: Badari Pulavarty
Date: Wed May 03 2006 - 13:26:47 EST


On Wed, 2006-05-03 at 12:11 +0800, Wu Fengguang wrote:
> On Tue, May 02, 2006 at 08:55:06AM -0700, Linus Torvalds wrote:
> > Doing prefetching on a physical block basis is simply not a valid
> > approach, for several reasons:
>
> Sorry!
> I made a misleading introduction. I'll try to explain it in more detail.
>
> DATA ACQUISITION
>
> /proc/filecache provides an interface to query the cached pages of any
> file. This information is expressed in tuples of <idx, len>, which
> more specifically means <mapping-offset, pages>.
>
> Normally one should use 'echo' to setup two parameters before doing
> 'cat':
> @file
> the filename;
> use 'ALL' to get a list all files cached
> @mask
> only show the pages with non-zero (page-flags & @mask);
> for simplicity, use '0' to show all present pages(take 0 as ~0)
>
> Normally, one should first get the file list using param 'file ALL',
> and then iterate through all the files and pages of interested with
> params 'file filename' and 'mask pagemask'.
>
> The param 'mask' acts as a filter for different users: it allows
> sysadms to know where his memory goes, and the prefetcher to ignore
> pages from false readahead.
>
> One can use 'mask hex(PG_active|PG_referenced|PG_mapped)' in its hex form
> to show only accessed pages(here PG_mapped is a faked flag), and use
> 'mask hex(PG_dirty)' to show only dirtied pages.
>
> One can use
> $ echo "file /sbin/init" > /proc/filecache
> $ echo "mask 0" > /proc/filecache
> $ cat /proc/filecache
> to get an idea which pages of /sbin/init are currently cached.
>
> In the proposal, I used the following example, which is proved to be
> rather misleading:
> $ echo "file /dev/hda1" > /proc/filecache
> $ cat /proc/filecache
> The intention of that example was to show that filesystem dir/inode
> buffer status -- which is the key data for user-land pre-caching --
> can also be retrieved through this interface.
>
> So the proposed solution is to
> - prefetch normal files on the virtual mapping level
> - prefetch fs dir/inode buffers on a physical block basis
>
> I/O SUBMISSION
> How can we avoid unnecessary seeks when prefetching on virtual mapping
> level? The answer is to leave this job to i/o elevators. What we
> should do is to present elevators with most readahead requests before
> too many requests being submitted to disk drivers.
> The proposed scheme is to:
> 1) (first things first)
> issue all readahead requests for filesystem buffers
> 2) (in background, often blocked)
> issue all readahead requests for normal files
> -) make sure the above requests are of really _low_ priority
> 3) regular system boot continues
> 4) promote the priority of any request that is now demanded by
> legacy programs
>
> In the scheme, most work is done by user-land tools. The required
> kernel support is minimal and general-purpose:
> - an /proc/filecache interface
> - the ability to promote I/O priority on demanded pages
>
> By this approach, we avoided the complicated OSX bootcache solution,
> which is a physical-blocks-based, special-handlings-in-kernel solution
> that is exactly what Linus is against.

Wu,

While ago, I hacked up similar /proc interface
echo "<filesystem-name>" > /proc/pagecache-usage

Which showed pagecache usage of every file in that filesystem
(filename, #num pages). My main objective was to shoot down pagecache
for all the files in a given filesystem. I ended up using it to do
posix_fadivse(POSIX_FADV_DONTNEED) on those files. (Initially, tried
to do this without this, by doing fadvise() on all files in the
filesystem - but ended up bloating up inode and dcache).

Yeah. having this kind of information would be useful. But I am not sure
how much of this can benefit regular workloads - unless one is willing
to tweak things heavily. Bottom line is, need to have a better strategy
on how you would use information ..

Thanks,
Badari


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/