Re: [PATCH v3 1/1] process_madvise.2: Add process_madvise man page

From: Suren Baghdasaryan
Date: Thu Feb 18 2021 - 14:24:07 EST


On Wed, Feb 17, 2021 at 11:55 PM Michael Kerrisk (man-pages)
<mtk.manpages@xxxxxxxxx> wrote:
>
> Hello Suren,
>
> >> Thanks. I added a few words to clarify this.>
> > Any link where I can see the final version?
>
> Sure:
> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/tree/man2/process_madvise.2
>
> Also rendered below.

Looks great. Thanks for improving it, Michael!

>
> Thanks,
>
> Michael
>
> NAME
> process_madvise - give advice about use of memory to a process
>
> SYNOPSIS
> #include <sys/uio.h>
>
> ssize_t process_madvise(int pidfd, const struct iovec *iovec,
> size_t vlen, int advice,
> unsigned int flags);
>
> Note: There is no glibc wrapper for this system call; see NOTES.
>
> DESCRIPTION
> The process_madvise() system call is used to give advice or direc‐
> tions to the kernel about the address ranges of another process or
> of the calling process. It provides the advice for the address
> ranges described by iovec and vlen. The goal of such advice is to
> improve system or application performance.
>
> The pidfd argument is a PID file descriptor (see pidfd_open(2))
> that specifies the process to which the advice is to be applied.
>
> The pointer iovec points to an array of iovec structures, defined
> in <sys/uio.h> as:
>
> struct iovec {
> void *iov_base; /* Starting address */
> size_t iov_len; /* Length of region */
> };
>
> The iovec structure describes address ranges beginning at iov_base
> address and with the size of iov_len bytes.
>
> The vlen specifies the number of elements in the iovec structure.
> This value must be less than or equal to IOV_MAX (defined in <lim‐
> its.h> or accessible via the call sysconf(_SC_IOV_MAX)).
>
> The advice argument is one of the following values:
>
> MADV_COLD
> See madvise(2).
>
> MADV_PAGEOUT
> See madvise(2).
>
> The flags argument is reserved for future use; currently, this ar‐
> gument must be specified as 0.
>
> The vlen and iovec arguments are checked before applying any ad‐
> vice. If vlen is too big, or iovec is invalid, then an error will
> be returned immediately and no advice will be applied.
>
> The advice might be applied to only a part of iovec if one of its
> elements points to an invalid memory region in the remote process.
> No further elements will be processed beyond that point. (See the
> discussion regarding partial advice in RETURN VALUE.)
>
> Permission to apply advice to another process is governed by a
> ptrace access mode PTRACE_MODE_READ_REALCREDS check (see
> ptrace(2)); in addition, because of the performance implications
> of applying the advice, the caller must have the CAP_SYS_ADMIN ca‐
> pability.
>
> RETURN VALUE
> On success, process_madvise() returns the number of bytes advised.
> This return value may be less than the total number of requested
> bytes, if an error occurred after some iovec elements were already
> processed. The caller should check the return value to determine
> whether a partial advice occurred.
>
> On error, -1 is returned and errno is set to indicate the error.
>
> ERRORS
> EBADF pidfd is not a valid PID file descriptor.
>
> EFAULT The memory described by iovec is outside the accessible ad‐
> dress space of the process referred to by pidfd.
>
> EINVAL flags is not 0.
>
> EINVAL The sum of the iov_len values of iovec overflows a ssize_t
> value.
>
> EINVAL vlen is too large.
>
> ENOMEM Could not allocate memory for internal copies of the iovec
> structures.
>
> EPERM The caller does not have permission to access the address
> space of the process pidfd.
>
> ESRCH The target process does not exist (i.e., it has terminated
> and been waited on).
>
> VERSIONS
> This system call first appeared in Linux 5.10. Support for this
> system call is optional, depending on the setting of the CON‐
> FIG_ADVISE_SYSCALLS configuration option.
>
> CONFORMING TO
> The process_madvise() system call is Linux-specific.
>
> NOTES
> Glibc does not provide a wrapper for this system call; call it us‐
> ing syscall(2).
>
> SEE ALSO
> madvise(2), pidfd_open(2), process_vm_readv(2),
> process_vm_write(2)
>
>
> --
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/