Re: [PATCH 1/2] mm/mempolicy: Allow lookup_node() to handle fatal signal

From: Peter Xu
Date: Tue Apr 14 2020 - 09:49:53 EST


On Tue, Apr 14, 2020 at 01:04:29PM +0200, Michal Hocko wrote:

[...]

> @@ -1247,6 +1248,10 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
> }
> EXPORT_SYMBOL_GPL(fixup_user_fault);
>
> +/*
> + * Please note that this function, unlike __get_user_pages will not
> + * return 0 for nr_pages > 0 without FOLL_NOWAIT
> + */
> static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
> struct mm_struct *mm,
> unsigned long start,
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 48ba9729062e..1965e2681877 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -927,10 +927,7 @@ static int lookup_node(struct mm_struct *mm, unsigned long addr)
>
> int locked = 1;
> err = get_user_pages_locked(addr & PAGE_MASK, 1, 0, &p, &locked);
> - if (err == 0) {
> - /* E.g. GUP interrupted by fatal signal */
> - err = -EFAULT;
> - } else if (err > 0) {
> + if (err > 0) {
> err = page_to_nid(p);
> put_page(p);
> }

Hi, Michal,

IIUC this is not the only place that we check against ret==0 for gup.
For example, the other direct caller of the same function,
get_vaddr_frames(), which will set -EFAULT too if ret==0. So do we
want to change all the places and don't check against zero explicitly?

I'm now thinking whether this would be good even if we refactored gup
and only allow it to return either >0 as number of page pinned, or <0
for all the rest. I'm not sure how others will see this, but the
answer is probably the same at least to me as before for this issue.

As a caller, I'll see gup as a black box. Even if the gup function
guarantees that the retcode won't be zero and documented it, I (as a
caller) will be using that to index page array so I'd still better to
check that value before I do anything (because it's meaningless to
index an array with zero size), and a convertion of "ret==0" -->
"-EFAULT" (or some other failures) in this case still makes sense.
While removing that doesn't help a lot, imho, but instead make it
slightly unsafer.

Maybe that's also why ret==0 hasn't been reworked for years? Maybe
there is just never a reason strong enough to do that explicitly,
because it's still good to check against ret==0 after all...

Thanks,

--
Peter Xu