Re: [PATCH 0/10 -v3] Handle oom bypass more gracefully

From: Tetsuo Handa
Date: Fri Jun 03 2016 - 11:17:54 EST


Michal Hocko wrote:
> On Fri 03-06-16 21:00:31, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > Patch 8 is new in this version and it addresses an issue pointed out
> > > by 0-day OOM report where an oom victim was reaped several times.
> >
> > I believe we need below once-you-nacked patch as well.
> >
> > It would be possible to clear victim->signal->oom_flag_origin when
> > that victim gets TIF_MEMDIE, but I think that moving oom_task_origin()
> > test to oom_badness() will allow oom_scan_process_thread() which calls
> > oom_unkillable_task() only for testing task->signal->oom_victims to be
> > removed by also moving task->signal->oom_victims test to oom_badness().
> > Thus, I prefer this way.
>
> Can we please forget about oom_task_origin for _now_. At least until we
> resolve the current pile? I am really skeptical oom_task_origin is a
> real problem and even if you think it might be and pulling its handling
> outside of oom_scan_process_thread would be better for other reasons we
> can do that later. Or do you insist this all has to be done in one go?
>
> To be honest, I feel less and less confident as the pile grows and
> chances of introducing new bugs just grows after each rebase which tries
> to address more subtle and unlikely issues.
>
> Do no take me wrong but I would rather make sure that the current pile
> is reviewed and no unintentional side effects are introduced than open
> yet another can of worms.
>
> Thanks!

We have to open yet another can of worms because you insist on using
"decision by feedback from the OOM reaper" than "decision by timeout". ;-)

To be honest, I don't think we need to apply this pile. What is missing for
handling subtle and unlikely issues is "eligibility check for not to select
the same victim forever" (i.e. always set MMF_OOM_REAPED or OOM_SCORE_ADJ_MIN,
and check them before exercising the shortcuts).

Current 4.7-rc1 code will be sufficient (and sometimes even better than
involving user visible changes / selecting next OOM victim without delay)
if we started with "decision by timer" (e.g.
http://lkml.kernel.org/r/201601072026.JCJ95845.LHQOFOOSMFtVFJ@xxxxxxxxxxxxxxxxxxx )
approach.

As long as you insist on "decision by feedback from the OOM reaper",
we have to guarantee that the OOM reaper is always invoked in order to
handle subtle and unlikely cases.