[PATCH] memcg: notify userspace about OOM when and action is due

From: Michal Hocko
Date: Thu Dec 12 2013 - 05:50:17 EST


Userspace is currently notified about OOM condition after fails
to reclaim any memory after MEM_CGROUP_RECLAIM_RETRIES rounds.
This usually means that the memcg is really in troubles and an
OOM action (either done by userspace or kernel) has to be taken.
The kernel OOM killer however bails out and doesn't kill anything
if it sees an already dying/exiting task in a good hope a memory
will be released and OOM situation will be resolved.

Therefore it makes sense to notify userspace only after really all
measures have been taken and an userspace action is required or
the kernel kills a task.

This patch also removes fatal_signal_pending and PF_EXITING check from
mem_cgroup_oom_synchronize because __mem_cgroup_try_charge already
checks for both and bypasses charge so we cannot end up in the oom path.

Signed-off-by: Michal Hocko <mhocko@xxxxxxx>
---
mm/memcontrol.c | 17 ++++-------------
mm/oom_kill.c | 5 +++++
2 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 98900c070045..af7148c77bac 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2235,16 +2235,6 @@ bool mem_cgroup_oom_synchronize(bool handle)
if (!handle)
goto cleanup;

- /*
- * If current has a pending SIGKILL or is exiting, then automatically
- * select it. The goal is to allow it to allocate so that it may
- * quickly exit and free its memory.
- */
- if (fatal_signal_pending(current) || current->flags & PF_EXITING) {
- set_thread_flag(TIF_MEMDIE);
- goto cleanup;
- }
-
owait.memcg = memcg;
owait.wait.flags = 0;
owait.wait.func = memcg_oom_wake_function;
@@ -2256,15 +2246,16 @@ bool mem_cgroup_oom_synchronize(bool handle)

locked = mem_cgroup_oom_trylock(memcg);

- if (locked)
- mem_cgroup_oom_notify(memcg);
-
if (locked && !memcg->oom_kill_disable) {
mem_cgroup_unmark_under_oom(memcg);
finish_wait(&memcg_oom_waitq, &owait.wait);
+ /* calls mem_cgroup_oom_notify if there is a task to kill */
mem_cgroup_out_of_memory(memcg, current->memcg_oom.gfp_mask,
current->memcg_oom.order);
} else {
+ if (locked && memcg->oom_kill_disable)
+ mem_cgroup_oom_notify(memcg);
+
schedule();
mem_cgroup_unmark_under_oom(memcg);
finish_wait(&memcg_oom_waitq, &owait.wait);
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 1e4a600a6163..47c9de8da36d 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -394,6 +394,8 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
dump_tasks(memcg, nodemask);
}

+extern void mem_cgroup_oom_notify(struct mem_cgroup *memcg);
+
#define K(x) ((x) << (PAGE_SHIFT-10))
/*
* Must be called while holding a reference to p, which will be released upon
@@ -470,6 +472,9 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
victim = p;
}

+ if (memcg)
+ mem_cgroup_oom_notify(memcg);
+
/* mm cannot safely be dereferenced after task_unlock(victim) */
mm = victim->mm;
pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB\n",
--
1.8.4.4

--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/