userspace pagecache management tool

From: Andrew Morton
Date: Sat Mar 03 2007 - 15:29:54 EST



I've uploaded to http://userweb.kernel.org/~akpm/pagecache-management/ a
little tool which permits the management of the pagecache usage of
arbitrary applications. Effectively it prevents the targetted application
from using any pagecache at all.

It is to address the "waah, backups fill my memory with pagecache" and the
"waah, updatedb swapped everything out" and the "waah, copying a DVD
gobbled all my memory" problems.


Although it is little more than a proof-of-concept it seems to be fairly
useful. When running

pagecache-management.sh dd if=100-mb-file of=foo
or
pagecache-management.sh cp -a /usr/src/linux-2.6.20 /usr/src/foo

the amount of pagecache in the machine is pretty much unaltered. Maybe a
megabyte of additional cache in the second case, because of ext3 indirect
blocks.


The tool uses an LD_PRELOAD hack to intercept glibc's read(), pread(),
write(), pwrite(), close() and dup2() functions. pagecache control is done
via posix_fadvise() and sync_file_range().

btw, for a while I was using fdatasync() on close(), but it was slow,
because fdatasync() has to run an ext3 commit to commit the metadata.
sync_file_range() doesn't do that, and the copy-a-kernel-tree testcase sped
up by a factor of five. So sync_file_range() rocks, but the powerpc guys
haven't wired it up yet.


There is much more which could be done to make this code smarter, but I
think the lesson here is that we can produce a far, far better result doing
this work in userspace than we could ever hope to do with an in-kernel
implementation. There are some enhancement suggestions in the
documentation file.


It would be good if someone could turn this into a real product, get it fed
into distros. Once the design is settled we should look at moving all the
functionality into glibc itself, IMO, and get rid of the LD_PRELOAD trick.

It might help if the kernel offered APIs which permit userspace to query
the number of resident pages in a file (well, actually it already does,
kind-of: mincore()) and the ability to query the number of dirty pages in a
file, etc. I'd be reluctant to tie the kernel ABI too closely to the
current pagecache implementation and data structures, but we can look at
these things.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/