Re: OOM detection regressions since 4.7
From: Greg KH
Date: Mon Aug 22 2016 - 09:31:57 EST
On Mon, Aug 22, 2016 at 12:54:41PM +0200, Michal Hocko wrote:
> On Mon 22-08-16 06:05:28, Greg KH wrote:
> > On Mon, Aug 22, 2016 at 11:37:07AM +0200, Michal Hocko wrote:
> [...]
> > > > From 899b738538de41295839dca2090a774bdd17acd2 Mon Sep 17 00:00:00 2001
> > > > From: Michal Hocko <mhocko@xxxxxxxx>
> > > > Date: Mon, 22 Aug 2016 10:52:06 +0200
> > > > Subject: [PATCH] mm, oom: prevent pre-mature OOM killer invocation for high
> > > > order request
> > > >
> > > > There have been several reports about pre-mature OOM killer invocation
> > > > in 4.7 kernel when order-2 allocation request (for the kernel stack)
> > > > invoked OOM killer even during basic workloads (light IO or even kernel
> > > > compile on some filesystems). In all reported cases the memory is
> > > > fragmented and there are no order-2+ pages available. There is usually
> > > > a large amount of slab memory (usually dentries/inodes) and further
> > > > debugging has shown that there are way too many unmovable blocks which
> > > > are skipped during the compaction. Multiple reporters have confirmed that
> > > > the current linux-next which includes [1] and [2] helped and OOMs are
> > > > not reproducible anymore. A simpler fix for the stable is to simply
> > > > ignore the compaction feedback and retry as long as there is a reclaim
> > > > progress for high order requests which we used to do before. We already
> > > > do that for CONFING_COMPACTION=n so let's reuse the same code when
> > > > compaction is enabled as well.
> > > >
> > > > [1] http://lkml.kernel.org/r/20160810091226.6709-1-vbabka@xxxxxxx
> > > > [2] http://lkml.kernel.org/r/f7a9ea9d-bb88-bfd6-e340-3a933559305a@xxxxxxx
> > > >
> > > > Fixes: 0a0337e0d1d1 ("mm, oom: rework oom detection")
> > > > Signed-off-by: Michal Hocko <mhocko@xxxxxxxx>
> > > > ---
> > > > mm/page_alloc.c | 50 ++------------------------------------------------
> > > > 1 file changed, 2 insertions(+), 48 deletions(-)
> >
> > So, if this goes into Linus's tree, can you let stable@xxxxxxxxxxxxxxx
> > know about it so we can add it to the 4.7-stable tree? Otherwise
> > there's not much I can do here now, right?
>
> My plan would be actually to not push this to Linus because we have a
> proper fix for Linus tree. It is just that the fix is quite large and I
> felt like the stable should get the most simple fix possible, which is
> this partial revert. So, what I am trying to tell is to push a non-linus
> patch to stable as it is simpler.
I _REALLY_ hate taking any patches that are not in Linus's tree as 90%
of the time (well, almost always), it ends up being wrong and hurting us
in the end.
What exactly are the commits that are in Linus's tree that resolve this
issue?
thanks,
greg k-h