Re: [PATCH] mm: mempolicy: handle vma with unmovable pages mapped correctly in mbind

From: Vlastimil Babka
Date: Fri Jun 21 2019 - 07:33:52 EST

Next message: Stephen Rothwell: "linux-next: Fixes tags need some work in the arm64 tree"
Previous message: Matthias Brugger: "[PATCH] [media] media: mtk-mdp: fix reference count on old device tree"
In reply to: Yang Shi: "Re: [PATCH] mm: mempolicy: handle vma with unmovable pages mapped correctly in mbind"
Next in thread: Yang Shi: "Re: [PATCH] mm: mempolicy: handle vma with unmovable pages mapped correctly in mbind"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 6/20/19 6:08 PM, Yang Shi wrote:
>
>
> On 6/20/19 12:18 AM, Vlastimil Babka wrote:
>> On 6/19/19 8:19 PM, Yang Shi wrote:
>>>>>> This is getting even more muddy TBH. Is there any reason that we
>>>>>> have to
>>>>>> handle this problem during the isolation phase rather the migration?
>>>>> I think it was already said that if pages can't be isolated, then
>>>>> migration phase won't process them, so they're just ignored.
>>>> Yesïexactly.
>>>>
>>>>> However I think the patch is wrong to abort immediately when
>>>>> encountering such page that cannot be isolated (AFAICS). IMHO it should
>>>>> still try to migrate everything it can, and only then return -EIO.
>>>> It is fine too. I don't see mbind semantics define how to handle such
>>>> case other than returning -EIO.
>> I think it does. There's:
>> If MPOL_MF_MOVE is specified in flags, then the kernel *will attempt to
>> move all the existing pages* ... If MPOL_MF_STRICT is also specified,
>> then the call fails with the error *EIO if some pages could not be moved*
>>
>> Aborting immediately would be against the attempt to move all.
>>
>>> By looking into the code, it looks not that easy as what I thought.
>>> do_mbind() would check the return value of queue_pages_range(), it just
>>> applies the policy and manipulates vmas as long as the return value is 0
>>> (success), then migrate pages on the list. We could put the movable
>>> pages on the list by not breaking immediately, but they will be ignored.
>>> If we migrate the pages regardless of the return value, it may break the
>>> policy since the policy will *not* be applied at all.
>> I think we just need to remember if there was at least one page that
>> failed isolation or migration, but keep working, and in the end return
>> EIO if there was such page(s). I don't think it breaks the policy. Once
>> pages are allocated in a mapping, changing the policy is a best effort
>> thing anyway.
>
> The current behavior is:
> If queue_pages_range() return -EIO (vma is not migratable, ignore other
> conditions since we just focus on page migration), the policy won't be
> set and no page will be migrated.

Ah, I see. IIUC the current behavior is due to your recent commit
a7f40cfe3b7a ("mm: mempolicy: make mbind() return -EIO when
MPOL_MF_STRICT is specified") in order to fix commit 6f4576e3687b
("mempolicy: apply page table walker on queue_pages_range()"), which
caused -EIO to be not returned enough. But I think you went too far and
instead return -EIO too much. If I look at the code in parent commit of
6f4576e3687b, I can see in queue_pages_range():

if ((flags & MPOL_MF_STRICT) ||
((flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) &&
vma_migratable(vma))) {

err = queue_pages_pgd_range(vma, start, endvma, nodes,
flags, private);
if (err)
break;
}

and in queue_pages_pte_range():

if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))
migrate_page_add(page, private, flags);
else
break;

So originally, there was no returning of -EIO due to !vma_migratable() -
as long as MPOL_MF_STRICT and MPOL_MF_MOVE* was specified, the code
tried to queue for migration everything it could and didn't ever abort,
AFAICS. And I still think that's the best possible behavior.

> However, the problem here is the vma might look migratable, but some or
> all the underlying pages are unmovable. So, my patch assumes the vma is
> *not* migratable if at least one page is unmovable. I'm not sure if it
> is possible to have both movable and unmovable pages for the same
> mapping or not, I'm supposed the vma would be split much earlier.
>
> If we don't abort immediately, then we record if there is unmovable
> page, then we could do:
> #1. Still follows the current behavior (then why not abort immediately?)

See above how the current behavior differs from the original one.

> #2. Set mempolicy then migrate all the migratable pages. But, we may end
> up with the pages on node A, but the policy says node B. Doesn't it
> break the policy?

The policy can already be "broken" (violated is probably better word) by
migrate_pages() failing. If that happens, we don't rollback the migrated
pages and reset the policy back, right? I think the manpage is clear
that MPOL_MF_MOVE is a best-effort. Userspace will know that not
everything was successfully migrated (via -EIO), and can take whatever
steps it deems necessary - attempt rollback, determine which exact
page(s) are violating the policy, etc.

>>
>>>>
>

Next message: Stephen Rothwell: "linux-next: Fixes tags need some work in the arm64 tree"
Previous message: Matthias Brugger: "[PATCH] [media] media: mtk-mdp: fix reference count on old device tree"
In reply to: Yang Shi: "Re: [PATCH] mm: mempolicy: handle vma with unmovable pages mapped correctly in mbind"
Next in thread: Yang Shi: "Re: [PATCH] mm: mempolicy: handle vma with unmovable pages mapped correctly in mbind"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]