Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free
From: Michal Hocko
Date: Tue Nov 22 2016 - 11:25:53 EST
On Tue 22-11-16 17:14:02, Vlastimil Babka wrote:
> On 11/22/2016 05:06 PM, Marc MERLIN wrote:
> > On Mon, Nov 21, 2016 at 01:56:39PM -0800, Marc MERLIN wrote:
> >> On Mon, Nov 21, 2016 at 10:50:20PM +0100, Vlastimil Babka wrote:
> >>>> 4.9rc5 however seems to be doing better, and is still running after 18
> >>>> hours. However, I got a few page allocation failures as per below, but the
> >>>> system seems to recover.
> >>>> Vlastimil, do you want me to continue the copy on 4.9 (may take 3-5 days)
> >>>> or is that good enough, and i should go back to 4.8.8 with that patch applied?
> >>>> https://marc.info/?l=linux-mm&m=147423605024993
> >>>
> >>> Hi, I think it's enough for 4.9 for now and I would appreciate trying
> >>> 4.8 with that patch, yeah.
> >>
> >> So the good news is that it's been running for almost 5H and so far so good.
> >
> > And the better news is that the copy is still going strong, 4.4TB and
> > going. So 4.8.8 is fixed with that one single patch as far as I'm
> > concerned.
> >
> > So thanks for that, looks good to me to merge.
>
> Thanks a lot for the testing. So what do we do now about 4.8? (4.7 is
> already EOL AFAICS).
>
> - send the patch [1] as 4.8-only stable. Greg won't like that, I expect.
> - alternatively a simpler (againm 4.8-only) patch that just outright
> prevents OOM for 0 < order < costly, as Michal already suggested.
> - backport 10+ compaction patches to 4.8 stable
> - something else?
>
> Michal? Linus?
Dunno. To be honest I do not like [1] because it seriously tweaks the
retry logic. 10+ compaction patches to 4.8 seems too much for a stable
tree and quite risky as well. Considering that 4.9 works just much
better, is there any strong reason to do 4.8 specific fix at all? Most
users reporting OOM regressions seemed to be ok with what 4.8 does
currently AFAIR. I hate that Marc is not falling into that category but
is it really problem for you to run with 4.9? If we have more users
seeing this regression then I would rather go with a simpler 4.8-only
"never trigger OOM for order > 0 && order < costly because that would at
least have deterministic behavior.
>
> [1] https://marc.info/?l=linux-mm&m=147423605024993
--
Michal Hocko
SUSE Labs