Re: Possible race condition in oom-killer
From: Tetsuo Handa
Date: Fri Jul 28 2017 - 09:56:04 EST
Michal Hocko wrote:
> On Fri 28-07-17 22:15:01, Tetsuo Handa wrote:
> > task_will_free_mem(current) in out_of_memory() returning false due to
> > MMF_OOM_SKIP already set allowed each thread sharing that mm to select a new
> > OOM victim. If task_will_free_mem(current) in out_of_memory() did not return
> > false, threads sharing MMF_OOM_SKIP mm would not have selected new victims
> > to the level where all OOM killable processes are killed and calls panic().
>
> I am not sure I understand. Do you mean this?
Yes.
> ---
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 9e8b4f030c1c..671e4a4107d0 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -779,13 +779,6 @@ static bool task_will_free_mem(struct task_struct *task)
> if (!__task_will_free_mem(task))
> return false;
>
> - /*
> - * This task has already been drained by the oom reaper so there are
> - * only small chances it will free some more
> - */
> - if (test_bit(MMF_OOM_SKIP, &mm->flags))
> - return false;
> -
> if (atomic_read(&mm->mm_users) <= 1)
> return true;
>
> If yes I would have to think about this some more because that might
> have weird side effects (e.g. oom_victims counting after threads passed
> exit_oom_victim).
But this check should not be removed unconditionally. We should still return
false if returning true was not sufficient to solve the OOM situation, for
we need to select next OOM victim in that case.
>
> Anyway the proper fix for this is to allow reaping mlocked pages.
Different approach is to set TIF_MEMDIE to all threads sharing the same
memory so that threads sharing MMF_OOM_SKIP mm do not need to call
out_of_memory() in order to get TIF_MEMDIE.
Yet another apporach is to use __GFP_KILLABLE (we can start it as
best effort basis).
> Is
> something other than the LTP test affected to give this more priority?
> Do we have other usecases where something mlocks the whole memory?
This panic was caused by 50 threads sharing MMF_OOM_SKIP mm exceeding
number of OOM killable processes. Whether memory is locked or not isn't
important. If a multi-threaded process which consumes little memory was
selected as an OOM victim (and reaped by the OOM reaper and MMF_OOM_SKIP
was set immediately), it might be still possible to select next OOM victims
needlessly.