Re: [External] Re: [RFC] mm: add new syscall pidfd_set_mempolicy()

From: Zhongkun He
Date: Wed Oct 12 2022 - 03:55:58 EST

Hi michal, thanks for your reply and suggestiones.

Please add some explanation why the cpuset interface is not usable for
that usecase.

To solve the issue, this patch introduces a new syscall
pidfd_set_mempolicy(2). it sets the NUMA memory policy of the thread
specified in pidfd.

In current process context there is no locking because only the process
accesses its own memory policy, so task_work is used in
pidfd_set_mempolicy() to update the mempolicy of the process specified
in pidfd, avoid using locks and race conditions.

Why cannot you alter kernel_set_mempolicy (and do_set_mempolicy) to
accept a task rather than operate on current?

I have tried it before this patch, but I found a problem.The allocation and update of mempolicy are in the current context, so it is not protected by any lock.But when the mempolicy is modified by other processes, the race condition appears.
Say something like the following

pidfd_set_mempolicy target task stack
mpol = get_task_policy;
old = task->mempolicy;
task->mempolicy = new;
page = __alloc_pages(mpol);
There is a situation that when the old mempolicy is released, the target task is still using the policy.It would be better if there are suggestions on this case.

I have to really say that I dislike the task_work approach because it
detaches the syscall from the actual operation and the caller simply
doesn't know when the operation has been completed.

I agree with you.This is indeed a problem.

Please also describe the security it.