Re: Unhelpful caching decisions, possibly related to active/inactive sizing

From: Rik van Riel
Date: Thu Feb 11 2016 - 15:34:16 EST


On Tue, 9 Feb 2016 17:42:56 -0500
Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
> On Tue, Feb 09, 2016 at 05:52:40PM +0100, Andres Freund wrote:

> > Rik asked me about active/inactive sizing in /proc/meminfo:
> > Active: 7860556 kB
> > Inactive: 5395644 kB
> > Active(anon): 2874936 kB
> > Inactive(anon): 432308 kB
> > Active(file): 4985620 kB
> > Inactive(file): 4963336 kB

> Yes, a generous minimum size of the inactive list made sense when it
> was the exclusive staging area to tell use-once pages from use-many
> pages. Now that we have refault information to detect use-many with
> arbitrary inactive list size, this minimum is no longer reasonable.
>
> The new minimum should be smaller, but big enough for applications to
> actually use the data in their pages between fault and eviction
> (i.e. it needs to take the aggregate readahead window into account),
> and big enough for active pages that are speculatively challenged
> during workingset changes to get re-activated without incurring IO.
>
> However, I don't think it makes sense to dynamically adjust the
> balance between the active and the inactive cache during refaults.

Johannes, does this patch look ok to you?

Andres, does this patch work for you?

-----8<-----
Subject: mm,vmscan: reduce size of inactive file list

The inactive file list should still be large enough to contain
readahead windows and freshly written file data, but it no
longer is the only source for detecting multiple accesses to
file pages. The workingset refault measurement code causes
recently evicted file pages that get accessed again after a
shorter interval to be promoted directly to the active list.

With that mechanism in place, we can afford to (on a larger
system) dedicate more memory to the active file list, so we
can actually cache more of the frequently used file pages
in memory, and not have them pushed out by streaming writes,
once-used streaming file reads, etc.

This can help things like database workloads, where only
half the page cache can currently be used to cache the
database working set. This patch automatically increases
that fraction on larger systems, using the same ratio that
has already been used for anonymous memory.

Signed-off-by: Rik van Riel <riel@xxxxxxxxxx>
Reported-by: Andres Freund <andres@xxxxxxxxxxx>
---
mm/vmscan.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index eb3dd37ccd7c..0a316c41bf80 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1928,13 +1928,14 @@ static inline bool inactive_anon_is_low(struct lruvec *lruvec)
*/
static bool inactive_file_is_low(struct lruvec *lruvec)
{
+ struct zone *zone = lruvec_zone(lruvec);
unsigned long inactive;
unsigned long active;

inactive = get_lru_size(lruvec, LRU_INACTIVE_FILE);
active = get_lru_size(lruvec, LRU_ACTIVE_FILE);

- return active > inactive;
+ return inactive * zone->inactive_ratio < active;
}

static bool inactive_list_is_low(struct lruvec *lruvec, enum lru_list lru)