Re: [PATCH v4 3/3] mm/mempolicy: add nodes_empty check in SYSC_migrate_pages
From: Yisheng Xie
Date: Sun Dec 03 2017 - 19:50:17 EST
Hi Vlastimil,
On 2017/12/1 23:18, Vlastimil Babka wrote:
> On 12/01/2017 10:55 AM, Yisheng Xie wrote:
>> As in manpage of migrate_pages, the errno should be set to EINVAL when
>> none of the node IDs specified by new_nodes are on-line and allowed by the
>> process's current cpuset context, or none of the specified nodes contain
>> memory. However, when test by following case:
>>
>> new_nodes = 0;
>> old_nodes = 0xf;
>> ret = migrate_pages(pid, old_nodes, new_nodes, MAX);
>>
>> The ret will be 0 and no errno is set. As the new_nodes is empty, we
>> should expect EINVAL as documented.
>>
>> To fix the case like above, this patch check whether target nodes AND
>> current task_nodes is empty, and then check whether AND
>> node_states[N_MEMORY] is empty.
>>
>> Meanwhile,this patch also remove the check of EPERM on CAP_SYS_NICE.
>> The caller of migrate_pages should be able to migrate the target process
>> pages anywhere the caller can allocate memory, if the caller can access
>> the mm_struct.
>>
>> Signed-off-by: Yisheng Xie <xieyisheng1@xxxxxxxxxx>
>> Cc: Andi Kleen <ak@xxxxxxxxxxxxxxx>
>> Cc: Chris Salls <salls@xxxxxxxxxxx>
>> Cc: Christopher Lameter <cl@xxxxxxxxx>
>> Cc: David Rientjes <rientjes@xxxxxxxxxx>
>> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
>> Cc: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
>> Cc: Tan Xiaojun <tanxiaojun@xxxxxxxxxx>
>> Cc: Vlastimil Babka <vbabka@xxxxxxx>
>> ---
>> v3:
>> * check whether node is empty after AND current task node, and then nodes
>> which have memory
>> v4:
>> * remove the check of EPERM on CAP_SYS_NICE.
>>
>> Hi Vlastimil and Christopher,
>>
>> Could you please help to review this version?
>
> Hi, I think we should stay with v3 after all. What I missed when
> reviewing it, is that the EPERM check is for cpuset_mems_allowed(task)
> and in v3 you add EINVAL check for cpuset_mems_allowed(current), which
> may not be the same, and the intention of CAP_SYS_NICE is not whether we
> can bypass our own cpuset, but whether we can bypass the target task's
> cpuset. Sorry for the confusion.
Ok, so please ignore this version.
Thanks
Yisheng Xie
>
>
> .
>