Re: mmap vs fs cache

From: Howard Chu
Date: Fri Mar 08 2013 - 15:05:10 EST


Johannes Weiner wrote:
On Fri, Mar 08, 2013 at 07:00:55AM -0800, Howard Chu wrote:
Chris Friesen wrote:
On 03/08/2013 03:40 AM, Howard Chu wrote:

There is no way that a process that is accessing only 30GB of a mmap
should be able to fill up 32GB of RAM. There's nothing else running on
the machine, I've killed or suspended everything else in userland
besides a couple shells running top and vmstat. When I manually
drop_caches repeatedly, then eventually slapd RSS/SHR grows to 30GB and
the physical I/O stops.

Is it possible that the kernel is doing some sort of automatic
readahead, but it ends up reading pages corresponding to data that isn't
ever queried and so doesn't get mapped by the application?

Yes, that's what I was thinking. I added a
posix_madvise(..POSIX_MADV_RANDOM) but that had no effect on the
test.

First obvious conclusion - kswapd is being too aggressive. When free
memory hits the low watermark, the reclaim shrinks slapd down from
25GB to 18-19GB, while the page cache still contains ~7GB of
unmapped pages. Ideally I'd like a tuning knob so I can say to keep
no more than 2GB of unmapped pages in the cache. (And the desired
effect of that would be to allow user processes to grow to 30GB
total, in this case.)

We should find out where the unmapped page cache is coming from if you
are only accessing mapped file cache and disabled readahead.

How do you arrive at this number of unmapped page cache?

This number is pretty obvious. When slapd has grown to 25GB, the page cache has grown to 32GB (less about 200MB, the minfree). So: 7GB unmapped in the cache.

What could happen is that previously used and activated pages do not
get evicted anymore since there is a constant supply of younger
reclaimable cache that is actually thrashing. Whenever you drop the
caches, you get rid of those stale active pages and allow the
previously thrashing cache to get activated. However, that would
require that there is already a significant amount of active file
pages before your workload starts (check the nr_active_file number in
/proc/vmstat before launching slapd, try sync; echo 3 >drop_caches
before launching to eliminate this option) OR that the set of pages
accessed during your workload changes and the combined set of pages
accessed by your workload is bigger than available memory -- which you
claimed would not happen because you only access the 30GB file area on
that system.

There are no other active pages before the test begins. There's nothing else running. caches have been dropped completely at the beginning.

The test clearly is accessing only 30GB of data. Once slapd reaches this process size, the test can be stopped and restarted any number of times, run for any number of hours continuously, and memory use on the system is unchanged, and no pageins occur.

--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/