Re: [External] Re: [RFC] mm: add new syscall pidfd_set_mempolicy()

From: Zhongkun He
Date: Wed Oct 12 2022 - 04:20:18 EST


On Mon, Oct 10, 2022 at 05:48:42PM +0800, Zhongkun He wrote:
There is usecase that System Management Software(SMS) want to give a
memory policy to other processes to make better use of memory.


Better say "There are usecases when system management utilities
want to apply memory policy to processes to make better use of memory".

The information about how to use memory is not known to the app.
Instead, it is known to the userspace daemon(SMS), and that daemon
will decide the memory usage policy based on different factors.


Better say "These utilities doesn't set memory usage policy, but
rather the job of reporting memory usage and setting the policy is
offloaded to an userspace daemon."

To solve the issue, this patch introduces a new syscall
pidfd_set_mempolicy(2). it sets the NUMA memory policy of the thread
specified in pidfd.


Better say "To solve the issue above, introduce new syscall
pidfd_set_mempolicy(2). The syscall sets NUMA memory policy for the
thread specified in pidfd".

In current process context there is no locking because only the process
accesses its own memory policy, so task_work is used in
pidfd_set_mempolicy() to update the mempolicy of the process specified
in pidfd, avoid using locks and race conditions.


Better say "In current process context there is no locking because
only processes access their own memory policy. For this reason, task_work
is used in pidfd_set_mempolicy() to set or update the mempolicy of process
specified in pid. Thuse, it avoids into race conditions."

The API is as follows,

long pidfd_set_mempolicy(int pidfd, int mode,
const unsigned long __user *nmask,
unsigned long maxnode,
unsigned int flags);

Set's the [pidfd] task's "task/process memory policy". The pidfd argument
is a PID file descriptor (see pidfd_open(2) man page) that specifies the
process to which the mempolicy is to be applied. The flags argument is
reserved for future use; currently, this argument must be specified as 0.
Please see the set_mempolicy(2) man page for more details about
other's arguments.


Why duplicating from the Documentation/ below?

Suggested-by: Michal Hocko <mhocko@xxxxxxxx>
Signed-off-by: Zhongkun He <hezhongkun.hzk@xxxxxxxxxxxxx>
---
.../admin-guide/mm/numa_memory_policy.rst | 21 ++++-
arch/alpha/kernel/syscalls/syscall.tbl | 1 +
arch/arm/tools/syscall.tbl | 1 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 3 +-
arch/ia64/kernel/syscalls/syscall.tbl | 1 +
arch/m68k/kernel/syscalls/syscall.tbl | 1 +
arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 1 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 1 +
arch/parisc/kernel/syscalls/syscall.tbl | 1 +
arch/powerpc/kernel/syscalls/syscall.tbl | 1 +
arch/s390/kernel/syscalls/syscall.tbl | 1 +
arch/sh/kernel/syscalls/syscall.tbl | 1 +
arch/sparc/kernel/syscalls/syscall.tbl | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
include/linux/mempolicy.h | 11 +++
include/linux/syscalls.h | 4 +
include/uapi/asm-generic/unistd.h | 5 +-
kernel/sys_ni.c | 1 +
mm/mempolicy.c | 89 +++++++++++++++++++
24 files changed, 146 insertions(+), 6 deletions(-)

diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Documentation/admin-guide/mm/numa_memory_policy.rst
index 5a6afecbb0d0..b864dd88b2d2 100644
--- a/Documentation/admin-guide/mm/numa_memory_policy.rst
+++ b/Documentation/admin-guide/mm/numa_memory_policy.rst
@@ -408,9 +408,10 @@ follows:
Memory Policy APIs
==================
-Linux supports 4 system calls for controlling memory policy. These APIS
-always affect only the calling task, the calling task's address space, or
-some shared object mapped into the calling task's address space.
+Linux supports 5 system calls for controlling memory policy. The first four
+APIS affect only the calling task, the calling task's address space, or some
+shared object mapped into the calling task's address space. The last one can
+set the mempolicy of task specified in pidfd.
.. note::
the headers that define these APIs and the parameter data types for
@@ -473,6 +474,20 @@ closest to which page allocation will come from. Specifying the home node overri
the default allocation policy to allocate memory close to the local node for an
executing CPU.
+Set [pidfd Task] Memory Policy::
+
+ long sys_pidfd_set_mempolicy(int pidfd, int mode,
+ const unsigned long __user *nmask,
+ unsigned long maxnode,
+ unsigned int flags);
+
+Set's the [pidfd] task's "task/process memory policy". The pidfd argument is
+a PID file descriptor (see pidfd_open(2) man page) that specifies the process
+to which the mempolicy is to be applied. The flags argument is reserved for
+future use; currently, this argument must be specified as 0. Please see the
+set_mempolicy(2) man page for more details about other's arguments.
+
+
Memory Policy Command Line Interface
====================================

The wording can be improved:

---- >8 ----

diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Documentation/admin-guide/mm/numa_memory_policy.rst
index b864dd88b2d236..6df35bf4f960bd 100644
--- a/Documentation/admin-guide/mm/numa_memory_policy.rst
+++ b/Documentation/admin-guide/mm/numa_memory_policy.rst
@@ -410,8 +410,8 @@ Memory Policy APIs
Linux supports 5 system calls for controlling memory policy. The first four
APIS affect only the calling task, the calling task's address space, or some
-shared object mapped into the calling task's address space. The last one can
-set the mempolicy of task specified in pidfd.
+shared object mapped into the calling task's address space. The last one
+sets the mempolicy of task specified in the pidfd.
.. note::
the headers that define these APIs and the parameter data types for
@@ -481,11 +481,11 @@ Set [pidfd Task] Memory Policy::
unsigned long maxnode,
unsigned int flags);
-Set's the [pidfd] task's "task/process memory policy". The pidfd argument is
-a PID file descriptor (see pidfd_open(2) man page) that specifies the process
-to which the mempolicy is to be applied. The flags argument is reserved for
-future use; currently, this argument must be specified as 0. Please see the
-set_mempolicy(2) man page for more details about other's arguments.
+Sets the task/process memory policy for the [pidfd] task. The pidfd argument
+is a PID file descriptor (see pidfd_open(2) man page for details) that
+specifies the process for which the mempolicy is applied to. The flags
+argument is reserved for future use; currently, it must be specified as 0.
+For the description of all other arguments, see set_mempolicy(2) man page.

Thanks.


Hi Bagas

I got it, thanks for your suggestions.