Re: [Question] ksm: rmap_item pointing to some stale vmas

From: Susheel Khiani
Date: Tue Jun 09 2015 - 14:26:58 EST

Next message: Rob Herring: "[PATCH 00/15] Kill off set_irq_flags"
Previous message: Andy Lutomirski: "Re: [PATCH v2 3/4] x86, mwaitt: introduce mwaix delay with a configurable timer"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 4/30/2015 11:37 AM, Susheel Khiani wrote:

But if I've misunderstood, and you think that what you're seeing
fits with the transient forking bugs I've (not quite) described,
and you can explain why even the transient case is important for
you to have fixed, then I really ought to redouble my efforts.

Hugh

I was able to root cause the issue as we got few instances of same and was frequently getting reproducible on stress tests. The reason why it was important was because failure to unmap ksm page was resulting into CMA allocation failure for us.

For cases like fork, what we observed is for private mapped file pages, stable_node pointed by KSM page won't cover all the mappings until ksmd completes one full scan. Only after ksmd scan, new rmap_items pointing to mappings in child process would come into existence. So in cases like CMA allocations where we can't wait for ksmd to complete one full cycle, we can traverse anon_vma tree from parent's anon_vma to find out all the pages wheres CMA is mapped.

I have tested the following patch on 3.10 kernel and with this change I am able to avoid CMA allocation failure which we were otherwise frequently seeing because of not able to unmap KSM page.

Please review and let me know the feedback.

[PATCH] ksm: Traverse through parent's anon_vma while unmapping

While doing try_to_unmap_ksm, we traverse through
rmap_item list to find out all the anon_vmas from which
page needs to be unmapped.

Now as per the design of KSM, it builds up its data
structures by looking into each mm, and comes back a cycle
later to find out which data structures are now outdated and
needs to be updated. So, for cases like fork, what we
observe is for private mapped file pages stable_node
pointed by KSM page won't cover all the mappings until
ksmd completes one full scan. Only after ksmd scan, new
rmap_items pointing to mappings in child process would come
into existence.

As a result unmapping of a stable page can't be done until
ksmd has completed one full scan. This becomes an issue in
case of CMA where we need to unmap and move a CMA page and
can't wait for ksmd to complete one cycle. Because of
new rmap_items for new mapping still not created we won't be
able to unmap CMA page from all the vmas where it is mapped.
This would result in frequent CMA allocation failures.

So instead of just relying on rmap_items list which we know
can contain incomplete list, we also scan anon_vma tree from
parent's anon_vma to find out all the vmas where CMA page is
mapped and thereby successfully unmap the page and move it
to new page.

Change-Id: I97cacf6a73734b10c7098362c20fb3f2d4040c76
Signed-off-by: Susheel Khiani <skhiani@xxxxxxxxxxxxxx>
---
mm/ksm.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 55 insertions(+), 3 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 11f6293..10d5266 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1956,6 +1956,7 @@ int page_referenced_ksm(struct page *page, struct mem_cgroup *memcg,
unsigned int mapcount = page_mapcount(page);
int referenced = 0;
int search_new_forks = 0;
+ int search_from_root = 0;

VM_BUG_ON(!PageKsm(page));
VM_BUG_ON(!PageLocked(page));
@@ -1968,9 +1969,20 @@ again:
struct anon_vma *anon_vma = rmap_item->anon_vma;
struct anon_vma_chain *vmac;
struct vm_area_struct *vma;
+ struct rb_root rb_root;
+
+ if (!search_from_root) {
+ if (anon_vma)
+ rb_root = anon_vma->rb_root;
+ }
+ else {
+ if (anon_vma && anon_vma->root) {
+ rb_root = anon_vma->root->rb_root;
+ }
+ }

anon_vma_lock_read(anon_vma);
- anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
+ anon_vma_interval_tree_foreach(vmac, &rb_root,
0, ULONG_MAX) {
vma = vmac->vma;
if (rmap_item->address < vma->vm_start ||
@@ -1999,6 +2011,11 @@ again:
}
if (!search_new_forks++)
goto again;
+
+ if (!search_from_root++) {
+ search_new_forks = 0;
+ goto again;
+ }
out:
return referenced;
}
@@ -2010,6 +2027,7 @@ int try_to_unmap_ksm(struct page *page, enum ttu_flags flags,
struct rmap_item *rmap_item;
int ret = SWAP_AGAIN;
int search_new_forks = 0;
+ int search_from_root = 0;

VM_BUG_ON(!PageKsm(page));
VM_BUG_ON(!PageLocked(page));
@@ -2028,9 +2046,20 @@ again:
struct anon_vma *anon_vma = rmap_item->anon_vma;
struct anon_vma_chain *vmac;
struct vm_area_struct *vma;
+ struct rb_root rb_root;
+
+ if (!search_from_root) {
+ if (anon_vma)
+ rb_root = anon_vma->rb_root;
+ }
+ else {
+ if (anon_vma && anon_vma->root) {
+ rb_root = anon_vma->root->rb_root;
+ }
+ }

anon_vma_lock_read(anon_vma);
- anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
+ anon_vma_interval_tree_foreach(vmac, &rb_root,
0, ULONG_MAX) {
vma = vmac->vma;
if (rmap_item->address < vma->vm_start ||
@@ -2056,6 +2085,11 @@ again:
}
if (!search_new_forks++)
goto again;
+
+ if(!search_from_root++) {
+ search_new_forks = 0;
+ goto again;
+ }
out:
return ret;
}
@@ -2068,6 +2102,7 @@ int rmap_walk_ksm(struct page *page, int (*rmap_one)(struct page *,
struct rmap_item *rmap_item;
int ret = SWAP_AGAIN;
int search_new_forks = 0;
+ int search_from_root = 0;

VM_BUG_ON(!PageKsm(page));
VM_BUG_ON(!PageLocked(page));
@@ -2080,9 +2115,21 @@ again:
struct anon_vma *anon_vma = rmap_item->anon_vma;
struct anon_vma_chain *vmac;
struct vm_area_struct *vma;
+ struct rb_root rb_root;
+
+ if (!search_from_root) {
+ if (anon_vma)
+ rb_root = anon_vma->rb_root;
+ }
+ else {
+ if (anon_vma && anon_vma->root) {
+ rb_root = anon_vma->root->rb_root;
+ }
+ }
+

anon_vma_lock_read(anon_vma);
- anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
+ anon_vma_interval_tree_foreach(vmac, &rb_root,
0, ULONG_MAX) {
vma = vmac->vma;
if (rmap_item->address < vma->vm_start ||
@@ -2107,6 +2154,11 @@ again:
}
if (!search_new_forks++)
goto again;
+
+ if (!search_from_root++) {
+ search_new_forks = 0;
+ goto again;
+ }
out:
return ret;
}
--
1.8.2.1

--
Susheel Khiani

QUALCOMM INDIA, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation

--
Susheel Khiani

QUALCOMM INDIA, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Rob Herring: "[PATCH 00/15] Kill off set_irq_flags"
Previous message: Andy Lutomirski: "Re: [PATCH v2 3/4] x86, mwaitt: introduce mwaix delay with a configurable timer"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]