Re: [PATCH v4 09/23] mm/gup: introduce pin_user_pages*() and FOLL_PIN

From: Jan Kara
Date: Wed Nov 13 2019 - 05:43:22 EST


On Tue 12-11-19 20:26:56, John Hubbard wrote:
> Introduce pin_user_pages*() variations of get_user_pages*() calls,
> and also pin_longterm_pages*() variations.
>
> These variants all set FOLL_PIN, which is also introduced, and
> thoroughly documented.
>
> The pin_longterm*() variants also set FOLL_LONGTERM, in addition
> to FOLL_PIN:
>
> pin_user_pages()
> pin_user_pages_remote()
> pin_user_pages_fast()
>
> pin_longterm_pages()
> pin_longterm_pages_remote()
> pin_longterm_pages_fast()
>
> All pages that are pinned via the above calls, must be unpinned via
> put_user_page().
>
> The underlying rules are:
>
> * These are gup-internal flags, so the call sites should not directly
> set FOLL_PIN nor FOLL_LONGTERM. That behavior is enforced with
> assertions, for the new FOLL_PIN flag. However, for the pre-existing
> FOLL_LONGTERM flag, which has some call sites that still directly
> set FOLL_LONGTERM, there is no assertion yet.
>
> * Call sites that want to indicate that they are going to do DirectIO
> ("DIO") or something with similar characteristics, should call a
> get_user_pages()-like wrapper call that sets FOLL_PIN. These wrappers
> will:
> * Start with "pin_user_pages" instead of "get_user_pages". That
> makes it easy to find and audit the call sites.
> * Set FOLL_PIN
>
> * For pages that are received via FOLL_PIN, those pages must be returned
> via put_user_page().
>
> Thanks to Jan Kara and Vlastimil Babka for explaining the 4 cases
> in this documentation. (I've reworded it and expanded upon it.)
>
> Reviewed-by: Mike Rapoport <rppt@xxxxxxxxxxxxx> # Documentation
> Reviewed-by: Jérôme Glisse <jglisse@xxxxxxxxxx>
> Cc: Jonathan Corbet <corbet@xxxxxxx>
> Cc: Ira Weiny <ira.weiny@xxxxxxxxx>
> Signed-off-by: John Hubbard <jhubbard@xxxxxxxxxx>

Thanks for the documentation. It looks great!

> diff --git a/mm/gup.c b/mm/gup.c
> index 83702b2e86c8..4409e84dff51 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -201,6 +201,10 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
> spinlock_t *ptl;
> pte_t *ptep, pte;
>
> + /* FOLL_GET and FOLL_PIN are mutually exclusive. */
> + if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) ==
> + (FOLL_PIN | FOLL_GET)))
> + return ERR_PTR(-EINVAL);
> retry:
> if (unlikely(pmd_bad(*pmd)))
> return no_page_table(vma, flags);

How does FOLL_PIN result in grabbing (at least normal, for now) page reference?
I didn't find that anywhere in this patch but it is a prerequisite to
converting any user to pin_user_pages() interface, right?

> +/**
> + * pin_user_pages_fast() - pin user pages in memory without taking locks
> + *
> + * Nearly the same as get_user_pages_fast(), except that FOLL_PIN is set. See
> + * get_user_pages_fast() for documentation on the function arguments, because
> + * the arguments here are identical.
> + *
> + * FOLL_PIN means that the pages must be released via put_user_page(). Please
> + * see Documentation/vm/pin_user_pages.rst for further details.
> + *
> + * This is intended for Case 1 (DIO) in Documentation/vm/pin_user_pages.rst. It
> + * is NOT intended for Case 2 (RDMA: long-term pins).
> + */
> +int pin_user_pages_fast(unsigned long start, int nr_pages,
> + unsigned int gup_flags, struct page **pages)
> +{
> + /* FOLL_GET and FOLL_PIN are mutually exclusive. */
> + if (WARN_ON_ONCE(gup_flags & FOLL_GET))
> + return -EINVAL;
> +
> + gup_flags |= FOLL_PIN;
> + return internal_get_user_pages_fast(start, nr_pages, gup_flags, pages);
> +}
> +EXPORT_SYMBOL_GPL(pin_user_pages_fast);

I was somewhat wondering about the number of functions you add here. So we
have:

pin_user_pages()
pin_user_pages_fast()
pin_user_pages_remote()

and then longterm variants:

pin_longterm_pages()
pin_longterm_pages_fast()
pin_longterm_pages_remote()

and obviously we have gup like:
get_user_pages()
get_user_pages_fast()
get_user_pages_remote()
... and some other gup variants ...

I think we really should have pin_* vs get_* variants as they are very
different in terms of guarantees and after conversion, any use of get_*
variant in non-mm code should be closely scrutinized. OTOH pin_longterm_*
don't look *that* useful to me and just using pin_* instead with
FOLL_LONGTERM flag would look OK to me and somewhat reduce the number of
functions which is already large enough? What do people think? I don't feel
too strongly about this but wanted to bring this up.

Honza



--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR