[PATCH] oom, oom_reaper: disable oom_reaper for oom_kill_allocating_task

From: Michal Hocko
Date: Wed Feb 17 2016 - 04:40:41 EST


Tetsuo has reported that oom_kill_allocating_task=1 will cause
oom_reaper_list corruption because oom_kill_process doesn't follow
standard OOM exclusion (aka ignores TIF_MEMDIE) and allows to enqueue
the same task multiple times - e.g. by sacrificing the same child
multiple times. Let's workaround this issue for now until we decide
how to handle oom_kill_allocating_task properly (should it sacrifice
children at all?) or come up with some other protection.

Reported-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
Signed-off-by: Michal Hocko <mhocko@xxxxxxxx>
---
mm/oom_kill.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 7e9953a64489..078e07ec0906 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -678,7 +678,14 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
unsigned int victim_points = 0;
static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
DEFAULT_RATELIMIT_BURST);
- bool can_oom_reap = true;
+ bool can_oom_reap;
+
+ /*
+ * XXX: oom_kill_allocating_task doesn't follow normal OOM exclusion
+ * and so the same task might enter oom_kill_process which oom_reaper
+ * cannot handle currently.
+ */
+ can_oom_reap = !sysctl_oom_kill_allocating_task;

/*
* If the task is already exiting, don't alarm the sysadmin or kill
--
2.7.0

--
Michal Hocko
SUSE Labs