Re: [PATCH] mm: mempolicy: handle vma with unmovable pages mapped correctly in mbind

From: Yang Shi
Date: Thu Jun 20 2019 - 12:08:44 EST

On 6/20/19 12:18 AM, Vlastimil Babka wrote:
On 6/19/19 8:19 PM, Yang Shi wrote:
This is getting even more muddy TBH. Is there any reason that we
have to
handle this problem during the isolation phase rather the migration?
I think it was already said that if pages can't be isolated, then
migration phase won't process them, so they're just ignored.

However I think the patch is wrong to abort immediately when
encountering such page that cannot be isolated (AFAICS). IMHO it should
still try to migrate everything it can, and only then return -EIO.
It is fine too. I don't see mbind semantics define how to handle such
case other than returning -EIO.
I think it does. There's:
If MPOL_MF_MOVE is specified in flags, then the kernel *will attempt to
move all the existing pages* ... If MPOL_MF_STRICT is also specified,
then the call fails with the error *EIO if some pages could not be moved*

Aborting immediately would be against the attempt to move all.

By looking into the code, it looks not that easy as what I thought.
do_mbind() would check the return value of queue_pages_range(), it just
applies the policy and manipulates vmas as long as the return value is 0
(success), then migrate pages on the list. We could put the movable
pages on the list by not breaking immediately, but they will be ignored.
If we migrate the pages regardless of the return value, it may break the
policy since the policy will *not* be applied at all.
I think we just need to remember if there was at least one page that
failed isolation or migration, but keep working, and in the end return
EIO if there was such page(s). I don't think it breaks the policy. Once
pages are allocated in a mapping, changing the policy is a best effort
thing anyway.

The current behavior is:
If queue_pages_range() return -EIO (vma is not migratable, ignore other conditions since we just focus on page migration), the policy won't be set and no page will be migrated.

However, the problem here is the vma might look migratable, but some or all the underlying pages are unmovable. So, my patch assumes the vma is *not* migratable if at least one page is unmovable. I'm not sure if it is possible to have both movable and unmovable pages for the same mapping or not, I'm supposed the vma would be split much earlier.

If we don't abort immediately, then we record if there is unmovable page, then we could do:
#1. Still follows the current behavior (then why not abort immediately?)
#2. Set mempolicy then migrate all the migratable pages. But, we may end up with the pages on node A, but the policy says node B. Doesn't it break the policy?