[PATCH] memcg: unlock page before charging it.

From: KAMEZAWA Hiroyuki
Date: Thu Jun 23 2011 - 02:05:57 EST


Currently we are keeping faulted page locked throughout whole __do_fault
call (except for page_mkwrite code path). If we do early COW we allocate a
new page which has to be charged for a memcg (mem_cgroup_newpage_charge).

This function, however, might block for unbounded amount of time if memcg
oom killer is disabled or fork-bomb is running because the only way out of
the OOM situation is either an external event or OOM-situation fix.

processes from faulting it in which is not good at all because we are
basically punishing potentially an unrelated process for OOM condition
in a different group (I have seen stuck system because of ld-2.11.1.so being
locked).

We can do test easily.
% cgcreate -g memory:A
% cgset -r memory.limit_in_bytes=64M A
% cgset -r memory.memsw.limit_in_bytes=64M A
% cd kernel_dir; cgexec -g memory:A make -j

Then, the whole system will live-locked until you kill 'make -j'
by hands (or push reboot...) This is because some important
page in a shared library are locked and never released bcause of fork-bomb.

This patch delays "charge" until unlock_page() called. There is
no problem as far as we keep reference on a page.
(memcg doesn't require page_lock()).

Then, above livelock disappears.

Reported-by: Lutz Vieweg <lvml@xxxxxx>
Original-idea-by: Michal Hocko <mhocko@xxxxxxx>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
---
mm/memory.c | 28 +++++++++++++++++++---------
1 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 87d9353..66442da 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3129,7 +3129,7 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
struct page *page;
pte_t entry;
int anon = 0;
- int charged = 0;
+ struct page *need_charge = NULL;
struct page *dirty_page = NULL;
struct vm_fault vmf;
int ret;
@@ -3177,12 +3177,7 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
ret = VM_FAULT_OOM;
goto out;
}
- if (mem_cgroup_newpage_charge(page, mm, GFP_KERNEL)) {
- ret = VM_FAULT_OOM;
- page_cache_release(page);
- goto out;
- }
- charged = 1;
+ need_charge = page;
copy_user_highpage(page, vmf.page, address, vma);
__SetPageUptodate(page);
} else {
@@ -3251,12 +3246,11 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
/* no need to invalidate: a not-present page won't be cached */
update_mmu_cache(vma, address, page_table);
} else {
- if (charged)
- mem_cgroup_uncharge_page(page);
if (anon)
page_cache_release(page);
else
anon = 1; /* no anon but release faulted_page */
+ need_charge = NULL;
}

pte_unmap_unlock(page_table, ptl);
@@ -3268,6 +3262,17 @@ out:
if (set_page_dirty(dirty_page))
page_mkwrite = 1;
unlock_page(dirty_page);
+ if (need_charge) {
+ /*
+ * charge this page before we drop refcnt.
+ * memory cgroup returns OOM condition when
+ * this task is killed. So, it's not necesasry
+ * to undo.
+ */
+ if (mem_cgroup_newpage_charge(need_charge,
+ mm, GFP_KERNEL))
+ ret = VM_FAULT_OOM;
+ }
put_page(dirty_page);
if (page_mkwrite && mapping) {
/*
@@ -3282,6 +3287,11 @@ out:
file_update_time(vma->vm_file);
} else {
unlock_page(vmf.page);
+ if (need_charge) {
+ if (mem_cgroup_newpage_charge(need_charge,
+ mm, GFP_KERNEL))
+ ret = VM_FAULT_OOM;
+ }
if (anon)
page_cache_release(vmf.page);
}
--
1.7.4.1




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/