Re: [PATCH] mm,oom: Try last second allocation before and after selecting an OOM victim.

From: Michal Hocko
Date: Wed Nov 01 2017 - 10:48:52 EST


On Wed 01-11-17 23:38:49, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Wed 01-11-17 20:58:50, Tetsuo Handa wrote:
> > > > > But doing ALLOC_OOM for last second allocation attempt from out_of_memory() involve
> > > > > duplicating code (e.g. rebuilding zone list).
> > > >
> > > > Why would you do it? Do not blindly copy and paste code without
> > > > a good reason. What kind of problem does this actually solve?
> > >
> > > prepare_alloc_pages()/finalise_ac() initializes as
> > >
> > > ac->high_zoneidx = gfp_zone(gfp_mask);
> > > ac->zonelist = node_zonelist(preferred_nid, gfp_mask);
> > > ac->preferred_zoneref = first_zones_zonelist(ac->zonelist,
> > > ac->high_zoneidx, ac->nodemask);
> > >
> > > and selecting as an OOM victim reinitializes as
> > >
> > > ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
> > > ac->preferred_zoneref = first_zones_zonelist(ac->zonelist,
> > > ac->high_zoneidx, ac->nodemask);
> > >
> > > and I assume that this reinitialization might affect which memory reserve
> > > the OOM victim allocates from.
> > >
> > > You mean such difference is too trivial to care about?
> >
> > You keep repeating what the _current_ code does without explaining _why_
> > do we need the same thing in the oom path. Could you finaly answer my
> > question please?
>
> Because I consider that following what the current code does is reasonable
> unless there are explicit reasons not to follow.

Following this pattern makes a code mess over time because nobody
remembers why something is done a specific way anymore. Everybody just
keeps the ball rolling because he is afraid to change the code he
doesn't understand. Don't do that!

[...]
> Does "that comment" refer to
>
> Elaborating the comment: the reason for the high wmark is to reduce
> the likelihood of livelocks and be sure to invoke the OOM killer, if
> we're still under pressure and reclaim just failed. The high wmark is
> used to be sure the failure of reclaim isn't going to be ignored. If
> using the min wmark like you propose there's risk of livelock or
> anyway of delayed OOM killer invocation.
>
> part? Then, I know it is not about gfp flags.
>
> But how can OOM livelock happen when the last second allocation does not
> wait for memory reclaim (because __GFP_DIRECT_RECLAIM is masked) ?
> The last second allocation shall return immediately, and we will call
> out_of_memory() if the last second allocation failed.

I think Andrea just wanted to say that we do want to invoke OOM killer
and resolve the memory pressure rather than keep looping in the
reclaim/oom path just because there are few pages allocated and freed in
the meantime.

[...]
> > I am not sure such a scenario matters all that much because it assumes
> > that the oom victim doesn't really free much memory [1] (basically less than
> > HIGH-MIN). Most OOM situation simply have a memory hog consuming
> > significant amount of memory.
>
> The OOM killer does not always kill a memory hog consuming significant amount
> of memory. The OOM killer kills a process with highest OOM score (and instead
> one of its children if any). I don't think that assuming an OOM victim will free
> memory enough to succeed ALLOC_WMARK_HIGH is appropriate.

OK, so let's agree to disagree. I claim that we shouldn't care all that
much. If any of the current heuristics turns out to lead to killing too
many tasks then we should simply remove it rather than keep bloating an
already complex code with more and more kluges.

--
Michal Hocko
SUSE Labs