Re: [PATCH v3 1/1] process_madvise.2: Add process_madvise man page
From: Michael Kerrisk (man-pages)
Date: Tue Feb 02 2021 - 05:46:10 EST
Hello Suren (and Minchan and Michal)
Thank you for the revisions!
I've applied this patch, and done a few light edits.
However, I have a questions about undocumented pieces in *madvise(2)*,
as well as one other question. See below.
On 2/2/21 6:30 AM, Suren Baghdasaryan wrote:
> Initial version of process_madvise(2) manual page. Initial text was
> extracted from [1], amended after fix [2] and more details added using
> man pages of madvise(2) and process_vm_read(2) as examples. It also
> includes the changes to required permission proposed in [3].
>
> [1] https://lore.kernel.org/patchwork/patch/1297933/
> [2] https://lkml.org/lkml/2020/12/8/1282
> [3] https://patchwork.kernel.org/project/selinux/patch/20210111170622.2613577-1-surenb@xxxxxxxxxx/#23888311
>
> Signed-off-by: Suren Baghdasaryan <surenb@xxxxxxxxxx>
> Reviewed-by: Michal Hocko <mhocko@xxxxxxxx>
> ---
> changes in v2:
> - Changed description of MADV_COLD per Michal Hocko's suggestion
> - Applied fixes suggested by Michael Kerrisk
> changes in v3:
> - Added Michal's Reviewed-by
> - Applied additional fixes suggested by Michael Kerrisk
>
> NAME
> process_madvise - give advice about use of memory to a process
>
> SYNOPSIS
> #include <sys/uio.h>
>
> ssize_t process_madvise(int pidfd,
> const struct iovec *iovec,
> unsigned long vlen,
> int advice,
> unsigned int flags);
>
> DESCRIPTION
> The process_madvise() system call is used to give advice or directions
> to the kernel about the address ranges of another process or the calling
> process. It provides the advice to the address ranges described by iovec
> and vlen. The goal of such advice is to improve system or application
> performance.
>
> The pidfd argument is a PID file descriptor (see pidfd_open(2)) that
> specifies the process to which the advice is to be applied.
>
> The pointer iovec points to an array of iovec structures, defined in
> <sys/uio.h> as:
>
> struct iovec {
> void *iov_base; /* Starting address */
> size_t iov_len; /* Number of bytes to transfer */
> };
>
> The iovec structure describes address ranges beginning at iov_base address
> and with the size of iov_len bytes.
>
> The vlen represents the number of elements in the iovec structure.
>
> The advice argument is one of the values listed below.
>
> Linux-specific advice values
> The following Linux-specific advice values have no counterparts in the
> POSIX-specified posix_madvise(3), and may or may not have counterparts
> in the madvise(2) interface available on other implementations.
>
> MADV_COLD (since Linux 5.4.1)
I just noticed these version numbers now, and thought: they can't be
right (because the system call appeared only in v5.11). So I removed
them. But, of course in another sense the version numbers are (nearly)
right, since these advice values were added for madvise(2) in Linux 5.4.
However, they are not documented in the madvise(2) manual page. Is it
correct to assume that MADV_COLD and MADV_PAGEOUT have exactly the same
meaning in madvise(2) (but just for the calling process, of course)?
> Deactive a given range of pages which will make them a more probable
I changed: s/Deactive/Deactivate/
> reclaim target should there be a memory pressure. This is a
> nondestructive operation. The advice might be ignored for some pages
> in the range when it is not applicable.
>
> MADV_PAGEOUT (since Linux 5.4.1)
> Reclaim a given range of pages. This is done to free up memory occupied
> by these pages. If a page is anonymous it will be swapped out. If a
> page is file-backed and dirty it will be written back to the backing
> storage. The advice might be ignored for some pages in the range when
> it is not applicable.
[...]
> The hint might be applied to a part of iovec if one of its elements points
> to an invalid memory region in the remote process. No further elements will
> be processed beyond that point.
Is the above scenario the one that leads to the partial advice case described in
RETURN VALUE? If yes, perhaps I should add some words to make that clearer.
You can see the light edits that I made in
https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=e3ce016472a1b3ec5dffdeb23c98b9fef618a97b
and following that I restructured DESCRIPTION a little in
https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=3aac0708a9acee5283e091461de6a8410bc921a6
Thanks,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/