Re: [v3 PATCH] mm: move_pages: return valid node id in status if the page is already on the target node

From: Yang Shi
Date: Thu Dec 05 2019 - 20:12:07 EST




On 12/5/19 4:19 PM, Qian Cai wrote:

On Dec 5, 2019, at 7:04 PM, John Hubbard <jhubbard@xxxxxxxxxx> wrote:

Felix's code is not random test code. It's code he wrote and he expected it to work.
Sure, but could he show a bit detail if the kernel will start to behavior as expected like what was written in the manpage to return ENOENT in this case, why is it not acceptable? i.e., why is it important to depend on the broken behavior?


I think I found the root cause. It did return -ENOENT when the syscall was introduced (my first impression was wrong), but the behavior was changed since 2.6.28 by commit e78bbfa82624 ("mm: stop returning -ENOENT from sys_move_pages() if nothing got migrated"). The full commit log is as below:

Author: Brice Goglin <Brice.Goglin@xxxxxxxx>
Date:ÂÂ Sat Oct 18 20:27:15 2008 -0700

ÂÂÂ mm: stop returning -ENOENT from sys_move_pages() if nothing got migrated

 A patchset reworking sys_move_pages(). It removes the possibly large
ÂÂÂ vmalloc by using multiple chunks when migrating large buffers. It also
ÂÂÂ dramatically increases the throughput for large buffers since the lookup
ÂÂÂ in new_page_node() is now limited to a single chunk, causing the quadratic
 complexity to have a much slower impact. There is no need to use any
ÂÂÂ radix-tree-like structure to improve this lookup.

ÂÂÂ sys_move_pages() duration on a 4-quadcore-opteron 2347HE (1.9Gz),
ÂÂÂ migrating between nodes #2 and #3:

ÂÂÂÂÂÂÂÂÂÂÂ lengthÂÂÂÂÂÂÂÂÂ move_pages (us)ÂÂÂÂÂÂÂÂ move_pages+patch (us)
ÂÂÂÂÂÂÂÂÂÂÂ 4kBÂÂÂÂÂÂÂÂÂÂÂÂ 126ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 98
ÂÂÂÂÂÂÂÂÂÂÂ 40kBÂÂÂÂÂÂÂÂÂÂÂ 198ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 168
ÂÂÂÂÂÂÂÂÂÂÂ 400kBÂÂÂÂÂÂÂÂÂÂ 963ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 937
ÂÂÂÂÂÂÂÂÂÂÂ 4MBÂÂÂÂÂÂÂÂÂÂÂÂ 12503ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 11930
ÂÂÂÂÂÂÂÂÂÂÂ 40MBÂÂÂÂÂÂÂÂÂÂÂ 246867ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 11848

ÂÂÂ Patches #1 and #4 are the important ones:
ÂÂÂ 1) stop returning -ENOENT from sys_move_pages() if nothing got migrated
ÂÂÂ 2) don't vmalloc a huge page_to_node array for do_pages_stat()
ÂÂÂ 3) extract do_pages_move() out of sys_move_pages()
ÂÂÂ 4) rework do_pages_move() to work on page_sized chunks
ÂÂÂ 5) move_pages: no need to set pp->page to ZERO_PAGE(0) by default

ÂÂÂ This patch:

ÂÂÂ There is no point in returning -ENOENT from sys_move_pages() if all pages
ÂÂÂ were already on the right node, while we return 0 if only 1 page was not.
ÂÂÂ Most application don't know where their pages are allocated, so it's not
ÂÂÂ an error to try to migrate them anyway.

ÂÂÂ Just return 0 and let the status array in user-space be checked if the
ÂÂÂ application needs details.

ÂÂÂ It will make the upcoming chunked-move_pages() support much easier.

ÂÂÂ Signed-off-by: Brice Goglin <Brice.Goglin@xxxxxxxx>
ÂÂÂ Acked-by: Christoph Lameter <cl@xxxxxxxxxxxxxxxxxxxx>
ÂÂÂ Cc: Nick Piggin <nickpiggin@xxxxxxxxxxxx>
ÂÂÂ Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
ÂÂÂ Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>


So the behavior was changed in kernel intentionally but never reflected in the manpage. I will propose a patch to fix the document.