Re: mmotm woes, mainly compaction

From: Michal Hocko
Date: Tue Apr 12 2016 - 08:10:45 EST


On Tue 12-04-16 00:18:00, Hugh Dickins wrote:
> Michal, I'm sorry to say that I now find that I misinformed you.
>
> You'll remember when we were chasing the order=2 OOMs on two of my
> machines at the end of March (in private mail). And you sent me a
> mail containing two patches, the second "Another thing to try ...
> so this on top" doing a *migrate_mode++.
>
> I answered you definitively that the first patch worked,
> so "I haven't tried adding the one below at all".
>
> Not true, I'm afraid. Although I had split the *migrate_mode++ one
> off into a separate patch that I did not apply, I found looking back
> today (when trying to work out why order=2 OOMs were still a problem
> on mmotm 2016-04-06) that I never deleted that part from the end of
> the first patch; so in fact what I'd been testing had included the
> second; and now I find that _it_ was the effective solution.
>
> Which is particularly sad because I think we were both a bit
> uneasy about the *migrate_mode++ one: partly the style of it
> incrementing the enum; but more seriously that it advances all the
> way to MIGRATE_SYNC, when the first went only to MIGRATE_SYNC_LIGHT.

Yeah, I was thinking about this some more and I have dropped
MIGRATE_SYNC patch because this is just too dangerous. It gets all the
way to to writeout() and this is a great stack overflow hazard. But I
guess we do not need this writeout and wait_on_page_writeback (done from
__unmap_and_move) would be sufficient. I was already thinking about
splitting MIGRATE_SYNC into two states one allowing the wait on events
and the other to allow the writeout.

> But without it, I am still stuck with the order=2 OOMs.
>
> And worse: after establishing that that fixes the order=2 OOMs for
> me on 4.6-rc2-mm1, I thought I'd better check that the three you
> posted today (the 1/2 classzone_idx one, the 2/2 prevent looping
> forever, and the "ction-abstract-compaction-feedback-to-helpers-fix";
> but I'm too far behind to consider or try the RFC thp backoff one)
> (a) did not surprisingly fix it on their own, and (b) worked well
> with the *migrate_mode++ one added in.

I am not really sure what you have been testing here. The hugetlb load
or the same make on tmpfs? I would be really surprised if any of the
above pathces made any difference for the make workload.

> (a) as you'd expect, they did not help on their own; and (b) they
> worked fine together on the G5 (until it hit the powerpc swapping
> sigsegv, which I think the powerpc guys are hoping is a figment of
> my imagination); but (b) they did not work fine together on the
> laptop, that combination now gives it order=1 OOMs. Despair.

Something is definitelly wrong here. I have already seen that compaction
is sometimes giving surprising results. I have seen Vlastimil has posted
some fixes so maybe this would be a side effect. I also hope to come up
with some reasonable set of trace points to tell us more but let's see
whether the order-2 issue can be solved first.

This is still with the ugly enum++ but let's close eyes and think about
something nicer...

Thanks!
---
diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h
index ebf3d89a3919..e1947d7af63f 100644
--- a/include/linux/migrate_mode.h
+++ b/include/linux/migrate_mode.h
@@ -6,11 +6,14 @@
* on most operations but not ->writepage as the potential stall time
* is too significant
* MIGRATE_SYNC will block when migrating pages
+ * MIGRATE_SYNC_WRITEOUT will trigger the IO when migrating pages. Make sure
+ * to not use this flag from deep stacks.
*/
enum migrate_mode {
MIGRATE_ASYNC,
MIGRATE_SYNC_LIGHT,
MIGRATE_SYNC,
+ MIGRATE_SYNC_WRITEOUT,
};

#endif /* MIGRATE_MODE_H_INCLUDED */
diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h
index 539b25a76111..0f14c65865ad 100644
--- a/include/trace/events/migrate.h
+++ b/include/trace/events/migrate.h
@@ -9,7 +9,8 @@
#define MIGRATE_MODE \
EM( MIGRATE_ASYNC, "MIGRATE_ASYNC") \
EM( MIGRATE_SYNC_LIGHT, "MIGRATE_SYNC_LIGHT") \
- EMe(MIGRATE_SYNC, "MIGRATE_SYNC")
+ EM( MIGRATE_SYNC, "MIGRATE_SYNC") \
+ EMe(MIGRATE_SYNC_WRITEOUT, "MIGRATE_SYNC_WRITEOUT")


#define MIGRATE_REASON \
diff --git a/mm/compaction.c b/mm/compaction.c
index 68dfbc07692d..7f631a6e234f 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1839,7 +1839,7 @@ static void compact_node(int nid)
{
struct compact_control cc = {
.order = -1,
- .mode = MIGRATE_SYNC,
+ .mode = MIGRATE_SYNC_WRITEOUT,
.ignore_skip_hint = true,
};

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 5a544c6c0717..a591b29a25ba 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1571,7 +1571,7 @@ static int soft_offline_huge_page(struct page *page, int flags)
}

ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
- MIGRATE_SYNC, MR_MEMORY_FAILURE);
+ MIGRATE_SYNC_WRITEOUT, MR_MEMORY_FAILURE);
if (ret) {
pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
pfn, ret, page->flags);
@@ -1651,7 +1651,7 @@ static int __soft_offline_page(struct page *page, int flags)
page_is_file_cache(page));
list_add(&page->lru, &pagelist);
ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
- MIGRATE_SYNC, MR_MEMORY_FAILURE);
+ MIGRATE_SYNC_WRITEOUT, MR_MEMORY_FAILURE);
if (ret) {
if (!list_empty(&pagelist)) {
list_del(&page->lru);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 7ad0d2eb9a2c..6cd8664c9e6e 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1554,7 +1554,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
* migrate_pages returns # of failed pages.
*/
ret = migrate_pages(&source, alloc_migrate_target, NULL, 0,
- MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
+ MIGRATE_SYNC_WRITEOUT, MR_MEMORY_HOTPLUG);
if (ret)
putback_movable_pages(&source);
}
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 7f80ebcd6552..a6a947980773 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1001,7 +1001,7 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,

if (!list_empty(&pagelist)) {
err = migrate_pages(&pagelist, new_node_page, NULL, dest,
- MIGRATE_SYNC, MR_SYSCALL);
+ MIGRATE_SYNC_WRITEOUT, MR_SYSCALL);
if (err)
putback_movable_pages(&pagelist);
}
@@ -1242,7 +1242,7 @@ static long do_mbind(unsigned long start, unsigned long len,
if (!list_empty(&pagelist)) {
WARN_ON_ONCE(flags & MPOL_MF_LAZY);
nr_failed = migrate_pages(&pagelist, new_page, NULL,
- start, MIGRATE_SYNC, MR_MEMPOLICY_MBIND);
+ start, MIGRATE_SYNC_WRITEOUT, MR_MEMPOLICY_MBIND);
if (nr_failed)
putback_movable_pages(&pagelist);
}
diff --git a/mm/migrate.c b/mm/migrate.c
index 028814625eea..3e907354cfec 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -809,7 +809,7 @@ static int fallback_migrate_page(struct address_space *mapping,
{
if (PageDirty(page)) {
/* Only writeback pages in full synchronous migration */
- if (mode != MIGRATE_SYNC)
+ if (mode != MIGRATE_SYNC_WRITEOUT)
return -EBUSY;
return writeout(mapping, page);
}
@@ -938,7 +938,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
* the retry loop is too short and in the sync-light case,
* the overhead of stalling is too much
*/
- if (mode != MIGRATE_SYNC) {
+ if (mode < MIGRATE_SYNC) {
rc = -EBUSY;
goto out_unlock;
}
@@ -1187,7 +1187,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
return -ENOMEM;

if (!trylock_page(hpage)) {
- if (!force || mode != MIGRATE_SYNC)
+ if (!force || mode < MIGRATE_SYNC)
goto out;
lock_page(hpage);
}
@@ -1447,7 +1447,7 @@ static int do_move_page_to_node_array(struct mm_struct *mm,
err = 0;
if (!list_empty(&pagelist)) {
err = migrate_pages(&pagelist, new_page_node, NULL,
- (unsigned long)pm, MIGRATE_SYNC, MR_SYSCALL);
+ (unsigned long)pm, MIGRATE_SYNC_WRITEOUT, MR_SYSCALL);
if (err)
putback_movable_pages(&pagelist);
}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6d1da0ceaf1e..d80c9755ffc7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3030,8 +3030,8 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
* failure could be caused by weak migration mode.
*/
if (compaction_failed(compact_result)) {
- if (*migrate_mode == MIGRATE_ASYNC) {
- *migrate_mode = MIGRATE_SYNC_LIGHT;
+ if (*migrate_mode < MIGRATE_SYNC) {
+ *migrate_mode++;
return true;
}
return false;
--
Michal Hocko
SUSE Labs