Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

From: Trevor Cordes
Date: Thu Jan 19 2017 - 04:49:34 EST


On 2017-01-17 Michal Hocko wrote:
> On Tue 17-01-17 14:21:14, Mel Gorman wrote:
> > On Tue, Jan 17, 2017 at 02:52:28PM +0100, Michal Hocko wrote:
> > > On Mon 16-01-17 11:09:34, Mel Gorman wrote:
> > > [...]
> > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > index 532a2a750952..46aac487b89a 100644
> > > > --- a/mm/vmscan.c
> > > > +++ b/mm/vmscan.c
> > > > @@ -2684,6 +2684,7 @@ static void shrink_zones(struct zonelist
> > > > *zonelist, struct scan_control *sc) continue;
> > > >
> > > > if (sc->priority != DEF_PRIORITY &&
> > > > + !buffer_heads_over_limit &&
> > > > !pgdat_reclaimable(zone->zone_pgdat))
> > > > continue; /* Let kswapd
> > > > poll it */
> > >
> > > I think we should rather remove pgdat_reclaimable here. This
> > > sounds like a wrong layer to decide whether we want to reclaim
> > > and how much.
> >
> > I had considered that but it'd also be important to add the other
> > 32-bit patches you have posted to see the impact. Because of the
> > ratio of LRU pages to slab pages, it may not have an impact but
> > it'd need to be eliminated.
>
> OK, Trevor you can pull from
> git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git tree
> fixes/highmem-node-fixes branch. This contains the current mmotm tree
> + the latest highmem fixes. I also do not expect this would help much
> in your case but as Mel've said we should rule that out at least.

Hi! The git tree above version oom'd after < 24 hours (3:02am) so
it doesn't solve the bug. If you need a oom messages dump let me know.

Let me know what to try next, guys, and I'll test it out.

> > Before prototyping such a thing, I'd like to hear the outcome of
> > this heavy hack and then add your 32-bit patches onto the list. If
> > the problem is still there then I'd next look at taking slab pages
> > into account in pgdat_reclaimable() instead of an outright removal
> > that has a much wider impact. If that doesn't work then I'll
> > prototype a heavy-handed forced slab reclaim when lower zones are
> > almost all slab pages.

I don't think I've tried the "heavy hack" patch yet? It's not in the
mhocko tree I just tried? Should I try the heavy hack on top of mhocko
git or on vanilla or what?

I also want to mention that these PAE boxes suffer from another
problem/bug that I've worked around for almost a year now. For some
reason it keeps gnawing at me that it might be related. The disk I/O
goes to pot on this/these PAE boxes after a certain amount of disk
writes (like some unknown number of GB, around 10-ish maybe). Like
writes go from 500MB/s to 10MB/s!! Reboot and it's magically 500MB/s
again. I detail this here:
https://muug.ca/pipermail/roundtable/2016-June/004669.html
My fix was to mem=XG where X is <8 (like 4 or 6) to force the PAE
kernel to be more sane about highmem choices. I never filed a bug
because I read a ton of stuff saying Linus hates PAE, don't use over
4G, blah blah. But the other fix is to:
set /proc/sys/vm/highmem_is_dirtyable to 1

I'm not bringing this up to get attention to a new bug, I bring this up
because it smells like it might be related. If something slowly eats
away at the box's vm to the point that I/O gets horribly slow, perhaps
it's related to the slab and high/lomem issue we have here? And if
related, it may help to solve the oom bug. If I'm way off base here,
just ignore my tangent!

The funny thing is I thought mem=XG where X<8 solved the problem, but
it doesn't! It greatly mitigates it, but I still get subtle slowdown
that gets worse over time (like weeks instead of days). I now use the
highmem_is_dirtyable on most boxes and that seems to solve it for good
in combo with mem=XG. Let me note, however, that I have NOT set
highmem_is_dirtyable=1 on the test box I am using for all of this
building/testing, as I wanted the config to stay static while I work
through this oom bug. (I'm real curious to see if
highmem_is_dirtyable=1 would have any impact on the oom though!)
Thanks!