Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)
From: Michal Hocko
Date: Thu Jan 19 2017 - 06:39:26 EST
On Thu 19-01-17 03:48:50, Trevor Cordes wrote:
> On 2017-01-17 Michal Hocko wrote:
> > On Tue 17-01-17 14:21:14, Mel Gorman wrote:
> > > On Tue, Jan 17, 2017 at 02:52:28PM +0100, Michal Hocko wrote:
> > > > On Mon 16-01-17 11:09:34, Mel Gorman wrote:
> > > > [...]
> > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > > index 532a2a750952..46aac487b89a 100644
> > > > > --- a/mm/vmscan.c
> > > > > +++ b/mm/vmscan.c
> > > > > @@ -2684,6 +2684,7 @@ static void shrink_zones(struct zonelist
> > > > > *zonelist, struct scan_control *sc) continue;
> > > > >
> > > > > if (sc->priority != DEF_PRIORITY &&
> > > > > + !buffer_heads_over_limit &&
> > > > > !pgdat_reclaimable(zone->zone_pgdat))
> > > > > continue; /* Let kswapd
> > > > > poll it */
> > > >
> > > > I think we should rather remove pgdat_reclaimable here. This
> > > > sounds like a wrong layer to decide whether we want to reclaim
> > > > and how much.
> > >
> > > I had considered that but it'd also be important to add the other
> > > 32-bit patches you have posted to see the impact. Because of the
> > > ratio of LRU pages to slab pages, it may not have an impact but
> > > it'd need to be eliminated.
> >
> > OK, Trevor you can pull from
> > git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git tree
> > fixes/highmem-node-fixes branch. This contains the current mmotm tree
> > + the latest highmem fixes. I also do not expect this would help much
> > in your case but as Mel've said we should rule that out at least.
>
> Hi! The git tree above version oom'd after < 24 hours (3:02am) so
> it doesn't solve the bug. If you need a oom messages dump let me know.
Yes please.
> Let me know what to try next, guys, and I'll test it out.
>
> > > Before prototyping such a thing, I'd like to hear the outcome of
> > > this heavy hack and then add your 32-bit patches onto the list. If
> > > the problem is still there then I'd next look at taking slab pages
> > > into account in pgdat_reclaimable() instead of an outright removal
> > > that has a much wider impact. If that doesn't work then I'll
> > > prototype a heavy-handed forced slab reclaim when lower zones are
> > > almost all slab pages.
>
> I don't think I've tried the "heavy hack" patch yet? It's not in the
> mhocko tree I just tried? Should I try the heavy hack on top of mhocko
> git or on vanilla or what?
>
> I also want to mention that these PAE boxes suffer from another
> problem/bug that I've worked around for almost a year now. For some
> reason it keeps gnawing at me that it might be related. The disk I/O
> goes to pot on this/these PAE boxes after a certain amount of disk
> writes (like some unknown number of GB, around 10-ish maybe). Like
> writes go from 500MB/s to 10MB/s!! Reboot and it's magically 500MB/s
> again. I detail this here:
> https://muug.ca/pipermail/roundtable/2016-June/004669.html
> My fix was to mem=XG where X is <8 (like 4 or 6) to force the PAE
> kernel to be more sane about highmem choices. I never filed a bug
> because I read a ton of stuff saying Linus hates PAE, don't use over
> 4G, blah blah. But the other fix is to:
> set /proc/sys/vm/highmem_is_dirtyable to 1
Yes this sounds like a dirty memory throttling and there were some
changes in that area. I do not remember when exactly.
> I'm not bringing this up to get attention to a new bug, I bring this up
> because it smells like it might be related. If something slowly eats
> away at the box's vm to the point that I/O gets horribly slow, perhaps
> it's related to the slab and high/lomem issue we have here? And if
> related, it may help to solve the oom bug. If I'm way off base here,
> just ignore my tangent!
>From your OOM reports so far it doesn't really seem related because you
never had large number of pages under the writeback when OOM.
The situation with the PAE kernel is unfortunate but it is really hard
to do anything about that considering that the kernel and most its
allocations have to live in a small and scarce lowmem memory. Moreover
the more memory you have to more you have to allocated from that memory.
This is why not only Linus hates 32b systems on a large memory systems.
--
Michal Hocko
SUSE Labs