Re: [PATCH] mm: madvise: return correct bytes advised with process_madvise

From: Charan Teja Kalla
Date: Thu Mar 10 2022 - 04:35:00 EST


Thanks Amit for the inputs!!

On 3/10/2022 12:20 AM, Nadav Amit wrote:
> ---
> mm/madvise.c | 12 +++++++++---
> 1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 38d0f51..d3b49b3 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -1426,15 +1426,21 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
>
> while (iov_iter_count(&iter)) {
> iovec = iov_iter_iovec(&iter);
> + /*
> + * Even when [start, end) passed to do_madvise covers
> + * some unmapped addresses, it continues processing with
> + * returning ENOMEM at the end. Thus consider the range
> + * as processed when do_madvise() returns ENOMEM.
> + * This makes process_madvise() never returns ENOMEM.
> + */
>
> I fully understand and relate to the basic motivation of this
> patch.
>
> The ENOMEM that this patch checks for, IIUC, is the ENOMEM that is
> returned on unmapped holes. Such ENOMEM does not appear, according to
> the man page, to be a valid reason to return ENOMEM to userspace.
> Presumably process_madvise() is expected to skip unmapped holes
> and not to fail because of them>
True, that ENOMEM represents the VMA passed contains the unmapped holes.
Pasting the Documentation of do_madvise():
* -ENOMEM - addresses in the specified range are not currently
* mapped, or are outside the AS of the process.

Internally process_madvise() calls do_madvise() in a loop by passing the
vma it received in 'struct iovec'. And I too agree here that
process_madvise() is expected to process the unmapped holes.

> Having said that, I do not think that the check that the patch does
> is clean or clearly documented.

If it is about the Documentation, how about adding: "Since
process_madvise() is expected to process unmapped holes, never return
ENOMEM received from do_madvise() to user". If the code changes can be
made further cleaner, please suggest.

>
> In addition, this patch (and some work on process_madvise()) raise
> in my mind a couple of questions:
>
> 1. There are other errors that process_madvise might encounter
> and can be propagated back to userspace, but are not
> documented. For instance if can_madv_lru_vma() fails on
> MADV_COLD, userspace will get EINVAL. EINVAL is not documented
> as a valid error-code for such case in either madvise() and
> process_madvise() man pages.

I agree here with the man page documentations too and felt the same
while going through them. For the mentioned case too, in the madvise[1]
man page, EINVAL return type is only talked for MADV_DONTNEED and
MADV_REMOVE. It should also contains for MADV_PAGEOUT, MADV_COLD and as
well for MADV_FREE. The other missing return types, which I came across,
in process_madvise are:
EINVAL - return from process_madvise_behavior_valid().
EINTR - from mm_access()
EACCES - from mm_access()

Thanks,
Charan