Re: [v2 PATCH 2/2] mm: mempolicy: handle vma with unmovable pages mapped correctly in mbind

From: Yang Shi
Date: Wed Jul 17 2019 - 15:25:16 EST




On 7/17/19 11:50 AM, Vlastimil Babka wrote:
On 7/17/19 8:23 PM, Yang Shi wrote:

On 7/16/19 10:28 AM, Yang Shi wrote:

On 7/16/19 5:07 AM, Vlastimil Babka wrote:
On 6/22/19 2:20 AM, Yang Shi wrote:
@@ -969,10 +975,21 @@ static long do_get_mempolicy(int *policy,
nodemask_t *nmask,
 /*
ÂÂ * page migration, thp tail pages can be passed.
ÂÂ */
-static void migrate_page_add(struct page *page, struct list_head
*pagelist,
+static int migrate_page_add(struct page *page, struct list_head
*pagelist,
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ unsigned long flags)
 {
ÂÂÂÂÂ struct page *head = compound_head(page);
+
+ÂÂÂ /*
+ * Non-movable page may reach here. And, there may be
+ÂÂÂÂ * temporaty off LRU pages or non-LRU movable pages.
+ÂÂÂÂ * Treat them as unmovable pages since they can't be
+ * isolated, so they can't be moved at the moment. It
+ÂÂÂÂ * should return -EIO for this case too.
+ÂÂÂÂ */
+ÂÂÂ if (!PageLRU(head) && (flags & MPOL_MF_STRICT))
+ÂÂÂÂÂÂÂ return -EIO;
+
Hm but !PageLRU() is not the only way why queueing for migration can
fail, as can be seen from the rest of the function. Shouldn't all cases
be reported?
Do you mean the shared pages and isolation failed pages? I'm not sure
whether we should consider these cases break the semantics or not, so
I leave them as they are. But, strictly speaking they should be
reported too, at least for the isolation failed page.
CC'd linux-api, should be done on v3 posting also.

By reading mbind man page, it says:

If MPOL_MF_MOVE is specified in flags, then the kernel will attempt to
move all the existing pages in the memory range so that they follow the
policy. Pages that are shared with other processes will not be moved.
If MPOL_MF_STRICT is also specified, then the call fails with the error
EIO if some pages could not be moved.
I don't think this means that for shared pages, -EIO should not be
reported. I can imagine both interpretations of the paragraph. I guess
we can be conservative and keep not reporting them, if that was always
the case - but then perhaps clarify the man page?

Yes, I agree the man page does looks ambiguous. Anyway, I think we could add a patch later to kernel or manpage for either interpretations once it gets clarified.


It looks the code already handles shared page correctly, we just need
return -EIO for isolation failed page if MPOL_MF_STRICT is specified.

Thanks,
Yang

ÂÂÂÂÂ /*
ÂÂÂÂÂÂ * Avoid migrating a page that is shared with others.
ÂÂÂÂÂÂ */
@@ -984,6 +1001,8 @@ static void migrate_page_add(struct page *page,
struct list_head *pagelist,
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ hpage_nr_pages(head));
ÂÂÂÂÂÂÂÂÂ }
ÂÂÂÂÂ }
+
+ÂÂÂ return 0;
 }
  /* page allocation callback for NUMA node migration */
@@ -1186,9 +1205,10 @@ static struct page *new_page(struct page
*page, unsigned long start)
 }
 #else
 -static void migrate_page_add(struct page *page, struct list_head
*pagelist,
+static int migrate_page_add(struct page *page, struct list_head
*pagelist,
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ unsigned long flags)
 {
+ÂÂÂ return -EIO;
 }
  int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,