Re: [RFC] Tracking mlocked pages and moving them off the LRU

From: Christoph Lameter
Date: Sat Feb 03 2007 - 14:05:47 EST


Here is the second piece removing mlock pages off the LRU during scanning.
I tried moving them to a separate list but then we run into issues with
locking. We do not need ithe list though since we will encounter the
page again anyways during zap_pte_range.

However, in zap_pte_range we then run into another problem. Multiple
zap_pte_ranges may handle the same page and without a page flag and
scanning all the vmas we cannot determine if the page should or should not
be moved back to the LRU. As a result this patch may decrement NR_MLOCK
too much so that is goes below zero. Any ideas on how to fix this without
a page flag and a scan over vmas?

Plus there is the issue of NR_MLOCK only being updated when we are
reclaiming and when we may already be in trouble. An app may mlock huge
amounts of memory and NR_MLOCK will stay low. If memory gets too low then
NR_MLOCKED is suddenly become accurate and the VM is likely undergoing a
shock from that discovery (should we actually use NR_MLOCK elsewhere to
determine memory management behavior). Hopefully we will not fall over
then.

Maybe the best would be to handle the counter separately via a page flag?
But then we go back to ugly vma scans. Yuck.

Index: current/mm/vmscan.c
===================================================================
--- current.orig/mm/vmscan.c 2007-02-03 10:53:15.000000000 -0800
+++ current/mm/vmscan.c 2007-02-03 10:53:25.000000000 -0800
@@ -516,10 +516,11 @@ static unsigned long shrink_page_list(st
if (page_mapped(page) && mapping) {
switch (try_to_unmap(page, 0)) {
case SWAP_FAIL:
- case SWAP_MLOCK:
goto activate_locked;
case SWAP_AGAIN:
goto keep_locked;
+ case SWAP_MLOCK:
+ goto mlocked;
case SWAP_SUCCESS:
; /* try to free the page below */
}
@@ -594,6 +595,11 @@ free_it:
__pagevec_release_nonlru(&freed_pvec);
continue;

+mlocked:
+ unlock_page(page);
+ __inc_zone_page_state(page, NR_MLOCK);
+ continue;
+
activate_locked:
SetPageActive(page);
pgactivate++;
Index: current/mm/memory.c
===================================================================
--- current.orig/mm/memory.c 2007-02-03 10:52:37.000000000 -0800
+++ current/mm/memory.c 2007-02-03 10:53:25.000000000 -0800
@@ -682,6 +682,10 @@ static unsigned long zap_pte_range(struc
file_rss--;
}
page_remove_rmap(page, vma);
+ if (vma->vm_flags & VM_LOCKED) {
+ __dec_zone_page_state(page, NR_MLOCK);
+ lru_cache_add_active(page);
+ }
tlb_remove_page(tlb, page);
continue;
}
Index: current/drivers/base/node.c
===================================================================
--- current.orig/drivers/base/node.c 2007-02-03 10:52:35.000000000 -0800
+++ current/drivers/base/node.c 2007-02-03 10:53:25.000000000 -0800
@@ -60,6 +60,7 @@ static ssize_t node_read_meminfo(struct
"Node %d FilePages: %8lu kB\n"
"Node %d Mapped: %8lu kB\n"
"Node %d AnonPages: %8lu kB\n"
+ "Node %d Mlock: %8lu KB\n"
"Node %d PageTables: %8lu kB\n"
"Node %d NFS_Unstable: %8lu kB\n"
"Node %d Bounce: %8lu kB\n"
@@ -82,6 +83,7 @@ static ssize_t node_read_meminfo(struct
nid, K(node_page_state(nid, NR_FILE_PAGES)),
nid, K(node_page_state(nid, NR_FILE_MAPPED)),
nid, K(node_page_state(nid, NR_ANON_PAGES)),
+ nid, K(node_page_state(nid, NR_MLOCK)),
nid, K(node_page_state(nid, NR_PAGETABLE)),
nid, K(node_page_state(nid, NR_UNSTABLE_NFS)),
nid, K(node_page_state(nid, NR_BOUNCE)),
Index: current/fs/proc/proc_misc.c
===================================================================
--- current.orig/fs/proc/proc_misc.c 2007-02-03 10:52:36.000000000 -0800
+++ current/fs/proc/proc_misc.c 2007-02-03 10:53:25.000000000 -0800
@@ -166,6 +166,7 @@ static int meminfo_read_proc(char *page,
"Writeback: %8lu kB\n"
"AnonPages: %8lu kB\n"
"Mapped: %8lu kB\n"
+ "Mlock: %8lu KB\n"
"Slab: %8lu kB\n"
"SReclaimable: %8lu kB\n"
"SUnreclaim: %8lu kB\n"
@@ -196,6 +197,7 @@ static int meminfo_read_proc(char *page,
K(global_page_state(NR_WRITEBACK)),
K(global_page_state(NR_ANON_PAGES)),
K(global_page_state(NR_FILE_MAPPED)),
+ K(global_page_state(NR_MLOCK)),
K(global_page_state(NR_SLAB_RECLAIMABLE) +
global_page_state(NR_SLAB_UNRECLAIMABLE)),
K(global_page_state(NR_SLAB_RECLAIMABLE)),
Index: current/include/linux/mmzone.h
===================================================================
--- current.orig/include/linux/mmzone.h 2007-02-03 10:52:35.000000000 -0800
+++ current/include/linux/mmzone.h 2007-02-03 10:53:25.000000000 -0800
@@ -58,6 +58,7 @@ enum zone_stat_item {
NR_FILE_DIRTY,
NR_WRITEBACK,
/* Second 128 byte cacheline */
+ NR_MLOCK, /* Mlocked pages */
NR_SLAB_RECLAIMABLE,
NR_SLAB_UNRECLAIMABLE,
NR_PAGETABLE, /* used for pagetables */
Index: current/mm/vmstat.c
===================================================================
--- current.orig/mm/vmstat.c 2007-02-03 10:52:36.000000000 -0800
+++ current/mm/vmstat.c 2007-02-03 10:53:25.000000000 -0800
@@ -439,6 +439,7 @@ static const char * const vmstat_text[]
"nr_file_pages",
"nr_dirty",
"nr_writeback",
+ "nr_mlock",
"nr_slab_reclaimable",
"nr_slab_unreclaimable",
"nr_page_table_pages",
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/