Re: [PATCH 1/6] mm/gup: introduce pin_user_page()

From: John Hubbard
Date: Tue Aug 30 2022 - 17:42:29 EST


On 8/30/22 05:17, David Hildenbrand wrote:
> On 29.08.22 21:33, John Hubbard wrote:
>> On 8/29/22 05:07, David Hildenbrand wrote:
>>>> +/**
>>>> + * pin_user_page() - apply a FOLL_PIN reference to a page
>>>> + *
>>>> + * @page: the page to be pinned.
>>>> + *
>>>> + * This is similar to get_user_pages(), except that the page's refcount is
>>>> + * elevated using FOLL_PIN, instead of FOLL_GET.
>>
>> Actually, my commit log has a more useful documentation of this routine,
>> and given the questions below, I think I'll change to that:
>>
>> * pin_user_page() is an externally-usable version of try_grab_page(), but with
>> * semantics that match get_page(), so that it can act as a drop-in replacement
>> * for get_page().
>> *
>> * pin_user_page() elevates a page's refcount using FOLL_PIN rules. This means
>> * that the caller must release the page via unpin_user_page().
>
> Some thoughts:
>
> a) Can we generalize such that pages with a dedicated pincount
> (multi-page folios) are also covered? Maybe avoiding the refcount
> terminology would be best.
>
> b) Should we directly work on folios?
>
> c) Would it be valid to pass in a tail page right now?

I would fervently prefer to defer those kinds of questions and ideas,
because the call sites are dealing simply in pages. And this is
really for file system call sites. The folio conversion is a larger
thing. Below...

>
>>
>>>> + *
>>>> + * IMPORTANT: The caller must release the page via unpin_user_page().
>>>> + *
>>>> + */
>>>> +void pin_user_page(struct page *page)
>>>> +{
>>>> + struct folio *folio = page_folio(page);
>>>> +
>>>> + WARN_ON_ONCE(folio_ref_count(folio) <= 0);
>>>> +
>>>
>>> We should warn if the page is anon and !exclusive.
>>
>> That would be sort of OK, because pin_user_page() is being created
>> specifically for file system (O_DIRECT cases) use, and so the pages
>> should mostly be file-backed, rather than anon. Although I'm a little
>> vague about whether all of these iov_iter cases are really always
>> file-backed pages, especially for cases such as splice(2) to an
>> O_DIRECT-opened file, that Al Viro mentioned [1].
>
> If we can, we should document that this interface is not for anonymous
> pages and WARN if pinning an anonymous page via this interface.

Yes. OK, I've rewritten the documentation again, and changed the
warning to just

WARN_ON_ONCE(PageAnon(page));

So it looks like this now, what do you think?

/**
* pin_user_page() - apply a FOLL_PIN reference to a file-backed page that the
* caller already owns.
*
* @page: the page to be pinned.
*
* pin_user_page() elevates a page's refcount using FOLL_PIN rules. This means
* that the caller must release the page via unpin_user_page().
*
* pin_user_page() is intended as a drop-in replacement for get_page(). This
* provides a way for callers to do a subsequent unpin_user_page() on the
* affected page. However, it is only intended for use by callers (file systems,
* block/bio) that have a file-backed page. Anonymous pages are not expected nor
* supported, and will generate a warning.
*
* pin_user_page() may also be thought of as an externally-usable version of
* try_grab_page(), but with semantics that match get_page(), so that it can act
* as a drop-in replacement for get_page().
*
* IMPORTANT: The caller must release the page via unpin_user_page().
*
*/

>
> The only reasonable way to obtain a pin on an anonymous page is via the
> page table. Here, FOLL_PIN should be used to do the right thing -- for
> example, unshare first (break COW) instead of pinning a shared anonymous
> page.
>
> Nothing would speak against duplicating such a pin using this interface
> (we'd have to sanity check that the page we're pinning may already be
> pinned), but I assume the pages we pin here are *not* necessarily
> obtained via GUP FOLL_PIN.
>
> I would be curious under which scenarios we could end up here with an
> anonymous page and how we obtained that reference (if not via GUP).

Let's see if the warning ever fires. I expect not, but if it does,
then I can add the !PageAnonExclusive qualifier to the warning.

>
>>
>> Can you walk me through the reasoning for why we need to keep out
>> anon shared pages?
>
> We make sure to only pin anonymous pages that are exclusive and check
> when unpinning -- see sanity_check_pinned_pages(), there is also a
> comment in there -- that pinned anonymous pages are in fact still
> exclusive, otherwise we might have a BUG lurking somewhere that can
> result in memory corruptions or leaking information between processes.
>
> For example, once we'd pinned an anonymous pages that are not marked
> exclusive (!PageAnonExclusive), or we'd be sharing a page that is
> pinned, the next write fault would replace the page in the user page
> table due to breaking COW, and the GUP pin would point at a different
> page than the page table.

Right, OK it all clicks together, thanks for that.

>
> Disallowing pinning of anon pages that may be shared in any case
> (FOLL_LONGTERM or not) simplifies GUP handling and allows for such
> sanity checks.
>
> (side note: after recent VM_BUG_ON discussions we might want to convert
> the VM_BUG_ON_PAGE in sanity_check_pinned_pages())
>
>>
>>>
>>> I assume the intend is to use pin_user_page() only to duplicate pins, right?
>>>
>>
>> Well, yes or no, depending on your use of the term "pin":
>>
>> pin_user_page() is used on a page that already has a refcount >= 1 (so
>> no worries about speculative pinning should apply here), but the page
>> does not necessarily have any FOLL_PIN's applied to it yet (so it's not
>> "pinned" in the FOLL_PIN sense).
>
> Okay, then we should really figure out if/how anonymous pages could end
> up here. I assume they can't, really. But let's see :)
>
>

Yes, again, the warning will reveal that. Which reminds me that I also
have it on my list to also write some test cases that do things like Al
Viro suggested: ITER_BVEC + O_DIRECT.


thanks,

--
John Hubbard
NVIDIA