Re: [patch] mm, vmscan: abort futile reclaim if we've been oomkilled

From: David Rientjes
Date: Wed Nov 13 2013 - 19:48:47 EST


On Wed, 13 Nov 2013, Johannes Weiner wrote:

> > The reclaim will fail, the only reason current has TIF_MEMDIE set is
> > because reclaim has completely failed.
>
> ...for somebody else.
>

That process is in the same allocating context as current, otherwise
current would not have been selected as a victim. The oom killer tries to
only kill processes that will lead to future memory freeing where reclaim
has failed.

> > I don't know of any other "random places" other than when the oom killed
> > process is sitting in reclaim before it is selected as the victim. Once
> > it returns to the page allocator, it will immediately allocate and then be
> > able to handle its pending SIGKILL. The one spot identified where it is
> > absolutely pointless to spin is in reclaim since it is virtually
> > guaranteed to fail. This patch fixes that issue.
>
> No, this applies to every other operation that does not immediately
> lead to the task exiting or which creates more system load. Readahead
> would be another example. They're all pointless and you could do
> without all of them at this point, but I'm not okay with putting these
> checks in random places that happen to bother you right now. It's not
> a proper solution to the problem.
>

If you have an alternative solution, please feel free to propose it and
I'll try it out.

This isn't only about the cond_resched() in shrink_slab(), the reclaim is
going to fail. There should be no instances where an oom killed process
can go and start magically reclaiming memory that would have prevented it
from becoming oom in the first place. I have seen the oom killer trigger
and the victim stall for several seconds before actually allocating memory
and that stall is pointless, especially when we're not touching a hotpath
here, we're in direct reclaim already.

> Is it a good idea to let ~700 processes simultaneously go into direct
> global reclaim?
>
> The victim aborting reclaim still leaves you with ~699 processes
> spinning in reclaim that should instead just retry the allocation as
> well. What about them?
>

Um, no, those processes are going through a repeated loop of direct
reclaim, calling the oom killer, iterating the tasklist, finding an
existing oom killed process that has yet to exit, and looping. They
wouldn't loop for too long if we can reduce the amount of time that it
takes for that oom killed process to exit.

> The situation your setups seem to get in frequently is bananas, don't
> micro optimize this.
>

Unless you propose an alternative solution, this is the patch that fixes
the problem when an oom killed process gets killed and then stalls for
seconds before it actually retries allocating memory.

Thanks for your thoughts.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/