Re: [PATCH] mm: madvise: return correct bytes advised with process_madvise

From: Minchan Kim
Date: Wed Mar 09 2022 - 12:00:47 EST


On Wed, Mar 09, 2022 at 10:57:59AM +0530, Charan Teja Kalla wrote:
> The process_madvise() system call returns error even after processing
> some VMA's passed in the 'struct iovec' vector list which leaves the
> user confused to know where to restart the advise next. It is also
> against this syscall man page[1] documentation where it mentions that
> "return value may be less than the total number of requested bytes, if
> an error occurred after some iovec elements were already processed.".
>
> Consider a user passed 10 VMA's in the 'struct iovec' vector list of
> which 9 are processed but one. Then it just returns the error caused on
> that failed VMA despite the first 9 VMA's processed, leaving the user
> confused about on which VMA it is failed. Returning the number of bytes
> processed here can help the user to know which VMA it is failed on and
> thus can retry/skip the advise on that VMA.
>
> [1]https://man7.org/linux/man-pages/man2/process_madvise.2.html.
>
> Fixes: ecb8ac8b1f14("mm/madvise: introduce process_madvise() syscall: an external memory hinting API"
> Signed-off-by: Charan Teja Kalla <quic_charante@xxxxxxxxxxx>
> ---
> mm/madvise.c | 12 +++++++++---
> 1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 38d0f51..d3b49b3 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -1426,15 +1426,21 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
>
> while (iov_iter_count(&iter)) {
> iovec = iov_iter_iovec(&iter);
> + /*
> + * Even when [start, end) passed to do_madvise covers
> + * some unmapped addresses, it continues processing with
> + * returning ENOMEM at the end. Thus consider the range
> + * as processed when do_madvise() returns ENOMEM.
> + * This makes process_madvise() never returns ENOMEM.
> + */

Looks like that this patch has two things. first, returns processed
bytes instead of error in case of error. Second, keep working on
rest vmas on -ENOMEM due to unmapped hole.

First thing totally makes sense to me(that's exactly I wanted to
do but somehow missed) so it should go stable tree. However,
second stuff might be arguble so it would be great if you split
the patch.

> ret = do_madvise(mm, (unsigned long)iovec.iov_base,
> iovec.iov_len, behavior);
> - if (ret < 0)
> + if (ret < 0 && ret != -ENOMEM)
> break;
> iov_iter_advance(&iter, iovec.iov_len);
> }
>
> - if (ret == 0)
> - ret = total_len - iov_iter_count(&iter);
> + ret = (total_len - iov_iter_count(&iter)) ? : ret;
>
> release_mm:
> mmput(mm);
> --
> 2.7.4
>