[PATCH 0/2] oom detection rework followups
From: Michal Hocko
Date: Mon Apr 11 2016 - 02:46:29 EST
Hi,
while playing with hugetlb test case described in [1] on a swapless
system I managed to get my machine in an endless look inside the
allocator. At first I found out the reclaim vs. compaction interaction
doesn't work quite well. See patch2 for more details but it still
bothered me why did_some_progress didn't break out of the loop. After
some more debugging it turned out that it is compaction_ready used in
the reclaim path which has been broken for quite some time. That's where
patch1 came in and which is something to apply regardless the rest of
the series.
I was thinking whether to mark it for stable but cannot decide one way
or the other. I think the fix is obvious but I am not so sure about all
the potential side effects. A wrong compaction_ready decision would
cause do_try_to_free_pages to break out early rather than dropping the
reclaim priority and spending more time scanning LRUs. I have hard time to
think about how good/bad this might be considering the compaction might
decide to defer or just to do something useful between reclaim rounds.
While patch 1 solved the issue I was seeing I still think that patch
2 is reasonable as well. It had fixed the issue as well but it is not
really needed (at least for the above mentioned load) right now. On the
other hand I like how it resembles the reclaim retry logic and puts some
bound to when it make some sense to retry.
So in short patch 1 should go regardless the oom detection rework which
might take some time to settle down (assuming I haven't missed something
and the fix is really correct), and patch 2 would be good to go on top of
the current series.
---
[1] http://lkml.kernel.org/r/1459855533-4600-12-git-send-email-mhocko@xxxxxxxxxx