Re: [RFC PATCH 0/2] mm: fix OOMs for binding workloads to movable zone only node

From: Michal Hocko
Date: Fri Nov 06 2020 - 02:43:51 EST


On Fri 06-11-20 12:32:44, Huang, Ying wrote:
> Michal Hocko <mhocko@xxxxxxxx> writes:
>
> > On Thu 05-11-20 09:40:28, Feng Tang wrote:
> >> On Wed, Nov 04, 2020 at 09:53:43AM +0100, Michal Hocko wrote:
> >>
> >> > > > As I've said in reply to your second patch. I think we can make the oom
> >> > > > killer behavior more sensible in this misconfigured cases but I do not
> >> > > > think we want break the cpuset isolation for such a configuration.
> >> > >
> >> > > Do you mean we skip the killing and just let the allocation fail? We've
> >> > > checked the oom killer code first, when the oom happens, both DRAM
> >> > > node and unmovable node have lots of free memory, and killing process
> >> > > won't improve the situation.
> >> >
> >> > We already do skip oom killer and fail for lowmem allocation requests already.
> >> > This is similar in some sense. Another option would be to kill the
> >> > allocating context which will have less corner cases potentially because
> >> > some allocation failures might be unexpected.
> >>
> >> Yes, this can avoid the helpless oom killing to kill a good process(no
> >> memory pressure at all)
> >>
> >> And I think the important thing is to judge whether this usage (binding
> >> docker like workload to unmovable node) is a valid case :)
> >
> > I am confused. Why wouldbe an unmovable node a problem. Movable
> > allocations can be satisfied from the Zone Normal just fine. It is other
> > way around that is a problem.
> >
> >> Initially, I thought it invalid too, but later think it still makes some
> >> sense for the 2 cases:
> >> * user want to bind his workload to one node(most of user space
> >> memory) to avoid cross-node traffic, and that node happens to
> >> be configured as unmovable
> >
> > See above
> >
> >> * one small DRAM node + big PMEM node, and memory latency insensitive
> >> workload could be bound to the cheaper unmovable PMEM node
> >
> > Please elaborate some more. As long as you have movable and normal nodes
> > then this should be possible with a deal of care - most notably the
> > movable:kernel ratio memory shouldn't be too big.
> >
> > Besides that why does PMEM node have to be MOVABLE only in the first
> > place?
>
> The performance of PMEM is much worse than that of DRAM. If we found
> that some pages on PMEM are accessed frequently (hot), we may want to
> move them to DRAM to optimize the system performance. If the unmovable
> pages are allocated on PMEM and hot, it's possible that we cannot move
> the pages to DRAM unless rebooting the system. So we think we should
> make the PMEM nodes to be MOVABLE only.

That is fair but then you really need a fallback node too. So this is
mere optimization rather than a fundamental restriction.
--
Michal Hocko
SUSE Labs