Re: [PATCH v3 1/1] process_madvise.2: Add process_madvise man page

From: Michael Kerrisk (man-pages)
Date: Thu Feb 18 2021 - 04:00:00 EST


Hello Suren,

>> Thanks. I added a few words to clarify this.>
> Any link where I can see the final version?

Sure:
https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/tree/man2/process_madvise.2

Also rendered below.

Thanks,

Michael

NAME
process_madvise - give advice about use of memory to a process

SYNOPSIS
#include <sys/uio.h>

ssize_t process_madvise(int pidfd, const struct iovec *iovec,
size_t vlen, int advice,
unsigned int flags);

Note: There is no glibc wrapper for this system call; see NOTES.

DESCRIPTION
The process_madvise() system call is used to give advice or direc‐
tions to the kernel about the address ranges of another process or
of the calling process. It provides the advice for the address
ranges described by iovec and vlen. The goal of such advice is to
improve system or application performance.

The pidfd argument is a PID file descriptor (see pidfd_open(2))
that specifies the process to which the advice is to be applied.

The pointer iovec points to an array of iovec structures, defined
in <sys/uio.h> as:

struct iovec {
void *iov_base; /* Starting address */
size_t iov_len; /* Length of region */
};

The iovec structure describes address ranges beginning at iov_base
address and with the size of iov_len bytes.

The vlen specifies the number of elements in the iovec structure.
This value must be less than or equal to IOV_MAX (defined in <lim‐
its.h> or accessible via the call sysconf(_SC_IOV_MAX)).

The advice argument is one of the following values:

MADV_COLD
See madvise(2).

MADV_PAGEOUT
See madvise(2).

The flags argument is reserved for future use; currently, this ar‐
gument must be specified as 0.

The vlen and iovec arguments are checked before applying any ad‐
vice. If vlen is too big, or iovec is invalid, then an error will
be returned immediately and no advice will be applied.

The advice might be applied to only a part of iovec if one of its
elements points to an invalid memory region in the remote process.
No further elements will be processed beyond that point. (See the
discussion regarding partial advice in RETURN VALUE.)

Permission to apply advice to another process is governed by a
ptrace access mode PTRACE_MODE_READ_REALCREDS check (see
ptrace(2)); in addition, because of the performance implications
of applying the advice, the caller must have the CAP_SYS_ADMIN ca‐
pability.

RETURN VALUE
On success, process_madvise() returns the number of bytes advised.
This return value may be less than the total number of requested
bytes, if an error occurred after some iovec elements were already
processed. The caller should check the return value to determine
whether a partial advice occurred.

On error, -1 is returned and errno is set to indicate the error.

ERRORS
EBADF pidfd is not a valid PID file descriptor.

EFAULT The memory described by iovec is outside the accessible ad‐
dress space of the process referred to by pidfd.

EINVAL flags is not 0.

EINVAL The sum of the iov_len values of iovec overflows a ssize_t
value.

EINVAL vlen is too large.

ENOMEM Could not allocate memory for internal copies of the iovec
structures.

EPERM The caller does not have permission to access the address
space of the process pidfd.

ESRCH The target process does not exist (i.e., it has terminated
and been waited on).

VERSIONS
This system call first appeared in Linux 5.10. Support for this
system call is optional, depending on the setting of the CON‐
FIG_ADVISE_SYSCALLS configuration option.

CONFORMING TO
The process_madvise() system call is Linux-specific.

NOTES
Glibc does not provide a wrapper for this system call; call it us‐
ing syscall(2).

SEE ALSO
madvise(2), pidfd_open(2), process_vm_readv(2),
process_vm_write(2)


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/