Re: [RFC PATCH] mm, oom: allow oom reaper to race with exit_mmap

From: Michal Hocko
Date: Wed Jul 12 2017 - 03:12:56 EST


On Tue 11-07-17 13:40:04, David Rientjes wrote:
> On Tue, 11 Jul 2017, Michal Hocko wrote:
>
> > This?
> > ---
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > index 5dc0ff22d567..e155d1d8064f 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -470,11 +470,14 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
> > {
> > struct mmu_gather tlb;
> > struct vm_area_struct *vma;
> > - bool ret = true;
> >
> > if (!down_read_trylock(&mm->mmap_sem))
> > return false;
> >
> > + /* There is nothing to reap so bail out without signs in the log */
> > + if (!mm->mmap)
> > + goto unlock;
> > +
> > /*
> > * Tell all users of get_user/copy_from_user etc... that the content
> > * is no longer stable. No barriers really needed because unmapping
> > @@ -508,9 +511,10 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
> > K(get_mm_counter(mm, MM_ANONPAGES)),
> > K(get_mm_counter(mm, MM_FILEPAGES)),
> > K(get_mm_counter(mm, MM_SHMEMPAGES)));
> > +unlock:
> > up_read(&mm->mmap_sem);
> >
> > - return ret;
> > + return true;
> > }
> >
> > #define MAX_OOM_REAP_RETRIES 10
>
> Yes, this folded in with the original RFC patch appears to work better
> with light testing.

Yes folding it into the original patch was the plan. I would really
appreciate some Tested-by here.

> However, I think MAX_OOM_REAP_RETRIES and/or the timeout of HZ/10 needs to
> be increased as well to address the issue that Tetsuo pointed out. The
> oom reaper shouldn't be required to do any work unless it is resolving a
> livelock, and that scenario should be relatively rare. The oom killer
> being a natural ultra slow path, I think it would be justifiable to wait
> longer or retry more times than simply 1 second before declaring that
> reaping is not possible. It reduces the likelihood of additional oom
> killing.

I believe that this is an independent issue and as such it should be
addressed separately along with some data backing up that decision. I am
not against improving the waiting logic. We would need some requeuing
when we cannot reap the victim because we cannot really wait too much
time on a single oom victim considering there might be many victims
queued (because of memcg ooms). This would obviously need some more code
and I am willing to implement that but I would like to see that this is
something that is a real problem first.

Thanks!
--
Michal Hocko
SUSE Labs