Re: Possible race condition in oom-killer

From: Tetsuo Handa
Date: Sat Jul 29 2017 - 00:32:39 EST


Michal Hocko wrote:
> On Fri 28-07-17 22:55:51, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > On Fri 28-07-17 22:15:01, Tetsuo Handa wrote:
> > > > task_will_free_mem(current) in out_of_memory() returning false due to
> > > > MMF_OOM_SKIP already set allowed each thread sharing that mm to select a new
> > > > OOM victim. If task_will_free_mem(current) in out_of_memory() did not return
> > > > false, threads sharing MMF_OOM_SKIP mm would not have selected new victims
> > > > to the level where all OOM killable processes are killed and calls panic().
> > >
> > > I am not sure I understand. Do you mean this?
> >
> > Yes.
> >
> > > ---
> > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > > index 9e8b4f030c1c..671e4a4107d0 100644
> > > --- a/mm/oom_kill.c
> > > +++ b/mm/oom_kill.c
> > > @@ -779,13 +779,6 @@ static bool task_will_free_mem(struct task_struct *task)
> > > if (!__task_will_free_mem(task))
> > > return false;
> > >
> > > - /*
> > > - * This task has already been drained by the oom reaper so there are
> > > - * only small chances it will free some more
> > > - */
> > > - if (test_bit(MMF_OOM_SKIP, &mm->flags))
> > > - return false;
> > > -
> > > if (atomic_read(&mm->mm_users) <= 1)
> > > return true;
> > >
> > > If yes I would have to think about this some more because that might
> > > have weird side effects (e.g. oom_victims counting after threads passed
> > > exit_oom_victim).
> >
> > But this check should not be removed unconditionally. We should still return
> > false if returning true was not sufficient to solve the OOM situation, for
> > we need to select next OOM victim in that case.
> >

I think that below one can manage this race condition.

---
include/linux/sched.h | 1 +
mm/oom_kill.c | 21 ++++++++++++++-------
2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 0db4870..3fccf72 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -652,6 +652,7 @@ struct task_struct {
/* disallow userland-initiated cgroup migration */
unsigned no_cgroup_migration:1;
#endif
+ unsigned oom_kill_free_check_raced:1;

unsigned long atomic_flags; /* Flags requiring atomic access. */

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 9e8b4f0..a093193 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -779,13 +779,6 @@ static bool task_will_free_mem(struct task_struct *task)
if (!__task_will_free_mem(task))
return false;

- /*
- * This task has already been drained by the oom reaper so there are
- * only small chances it will free some more
- */
- if (test_bit(MMF_OOM_SKIP, &mm->flags))
- return false;
-
if (atomic_read(&mm->mm_users) <= 1)
return true;

@@ -806,6 +799,20 @@ static bool task_will_free_mem(struct task_struct *task)
}
rcu_read_unlock();

+ /*
+ * It is possible that current thread fails to try allocation from
+ * memory reserves if the OOM reaper set MMF_OOM_SKIP on this mm before
+ * current thread calls out_of_memory() in order to get TIF_MEMDIE.
+ * In that case, allow current thread to try TIF_MEMDIE allocation
+ * before start selecting next OOM victims.
+ */
+ if (ret && test_bit(MMF_OOM_SKIP, &mm->flags)) {
+ if (task == current && !task->oom_kill_free_check_raced)
+ task->oom_kill_free_check_raced = true;
+ else
+ ret = false;
+ }
+
return ret;
}

--
1.8.3.1

What is "oom_victims counting after threads passed exit_oom_victim" ?