Re: [PATCH 1/2] mm: add per-zone lru list stat

From: Mel Gorman
Date: Wed Jul 20 2016 - 06:56:08 EST


On Wed, Jul 20, 2016 at 09:16:24AM +0900, Minchan Kim wrote:
> On Tue, Jul 19, 2016 at 05:48:57PM +0100, Mel Gorman wrote:
> > On Wed, Jul 20, 2016 at 12:50:32AM +0900, Minchan Kim wrote:
> > > While I did stress test with hackbench, I got OOM message frequently
> > > which didn't ever happen in zone-lru.
> > >
> >
> > This one also showed pgdat going unreclaimable early. Have you tried any
> > of the three oom-related patches I sent to Joonsoo to see what impact,
> > if any, it had?
>
> Before the result, I want to say goal of this patch, again.
> Without per-zone lru stat, it's really hard to debug OOM problem in
> multiple zones system so regardless of solving the problem, we should add
> per-zone lru stat for debuggability of OOM which has been never perfect
> solution, ever.
>

That's not in dispute, I simply wanted to know the impact.

> The result is not OOM but hackbench stalls forever.

Ok, that points to both the premature marking pgdats as unreclaimable
and the inactive rotation are both problems.

I have a series prepared that may or may not address the problem. I'm
trying to reproduce the OOM killer on a 32-bit KVM but so far so luck.
If I fail to reproduce it then I cannot tell if the series has an impact
and may have to post it and hope you and Joonsoo can test it.

--
Mel Gorman
SUSE Labs