Re: [PATCH v1 2/3] mm/gup: use gup_can_follow_protnone() also in GUP-fast

From: David Hildenbrand
Date: Tue Aug 30 2022 - 15:26:12 EST


On 30.08.22 21:18, John Hubbard wrote:
> On 8/30/22 11:53, David Hildenbrand wrote:
>> Good, I managed to attract the attention of someone who understands that machinery :)
>>
>> While validating whether GUP-fast and PageAnonExclusive code work correctly,
>> I started looking at the whole RCU GUP-fast machinery. I do have a patch to
>> improve PageAnonExclusive clearing (I think we're missing memory barriers to
>> make it work as expected in any possible case), but I also stumbled eventually
>> over a more generic issue that might need memory barriers.
>>
>> Any thoughts whether I am missing something or this is actually missing
>> memory barriers?
>>
>
> It's actually missing memory barriers.
>
> In fact, others have had that same thought! [1] :) In that 2019 thread,
> I recall that this got dismissed because of a focus on the IPI-based
> aspect of gup fast synchronization (there was some hand waving, perhaps
> accurate waving, about memory barriers vs. CPU interrupts). But now the
> RCU (non-IPI) implementation is more widely used than it used to be, the
> issue is clearer.
>
>>
>> From ce8c941c11d1f60cea87a3e4d941041dc6b79900 Mon Sep 17 00:00:00 2001
>> From: David Hildenbrand <david@xxxxxxxxxx>
>> Date: Mon, 29 Aug 2022 16:57:07 +0200
>> Subject: [PATCH] mm/gup: update refcount+pincount before testing if the PTE
>> changed
>>
>> mm/ksm.c:write_protect_page() has to make sure that no unknown
>> references to a mapped page exist and that no additional ones with write
>> permissions are possible -- unknown references could have write permissions
>> and modify the page afterwards.
>>
>> Conceptually, mm/ksm.c:write_protect_page() consists of:
>> (1) Clear/invalidate PTE
>> (2) Check if there are unknown references; back off if so.
>> (3) Update PTE (e.g., map it R/O)
>>
>> Conceptually, GUP-fast code consists of:
>> (1) Read the PTE
>> (2) Increment refcount/pincount of the mapped page
>> (3) Check if the PTE changed by re-reading it; back off if so.
>>
>> To make sure GUP-fast won't be able to grab additional references after
>> clearing the PTE, but will properly detect the change and back off, we
>> need a memory barrier between updating the recount/pincount and checking
>> if it changed.
>>
>> try_grab_folio() doesn't necessarily imply a memory barrier, so add an
>> explicit smp_mb__after_atomic() after the atomic RMW operation to
>> increment the refcount and pincount.
>>
>> ptep_clear_flush() used to clear the PTE and flush the TLB should imply
>> a memory barrier for flushing the TLB, so don't add another one for now.
>>
>> PageAnonExclusive handling requires further care and will be handled
>> separately.
>>
>> Fixes: 2667f50e8b81 ("mm: introduce a general RCU get_user_pages_fast()")
>> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>
>> ---
>> mm/gup.c | 17 +++++++++++++++++
>> 1 file changed, 17 insertions(+)
>>
>> diff --git a/mm/gup.c b/mm/gup.c
>> index 5abdaf487460..0008b808f484 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -2392,6 +2392,14 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
>> goto pte_unmap;
>> }
>>
>> + /*
>> + * Update refcount/pincount before testing for changed PTE. This
>> + * is required for code like mm/ksm.c:write_protect_page() that
>> + * wants to make sure that a page has no unknown references
>> + * after clearing the PTE.
>> + */
>> + smp_mb__after_atomic();
>> +
>> if (unlikely(pte_val(pte) != pte_val(*ptep))) {
>> gup_put_folio(folio, 1, flags);
>> goto pte_unmap;
>> @@ -2577,6 +2585,9 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
>> if (!folio)
>> return 0;
>>
>> + /* See gup_pte_range(). */
>
> Don't we usually also identify what each mb pairs with, in the comments? That would help.

Yeah, if only I could locate them reliably (as documented ptep_clear_flush()
should imply one I guess) ... but it will depend on the context.


As I now have the attention of two people that understand that machinery,
here goes the PageAnonExclusive thing I *think* should be correct.

The IPI-based mechanism really did make such synchronization with
GUP-fast easier ...