[PATCH] mm/gup: update refcount+pincount before testing if the PTE changed

From: David Hildenbrand
Date: Mon Aug 29 2022 - 10:57:07 EST


mm/ksm.c:write_protect_page() has to make sure that no unknown
references to a mapped page exist and that no additional ones with write
permissions are possible -- unknown references could have write permissions
and modify the page afterwards.

Conceptually, mm/ksm.c:write_protect_page() consists of:
(1) Clear/invalidate PTE
(2) Check if there are unknown references; back off if so.
(3) Update PTE (e.g., map it R/O)

Conceptually, GUP-fast code consists of:
(1) Read the PTE
(2) Increment refcount/pincount of the mapped page
(3) Check if the PTE changed by re-reading it; back off if so.

To make sure GUP-fast won't be able to grab additional references after
clearing the PTE, but will properly detect the change and back off, we
need a memory barrier between updating the recount/pincount and checking
if it changed.

try_grab_folio() doesn't necessarily imply a memory barrier, so add an
explicit smp_mb__after_atomic() after the atomic RMW operation to
increment the refcount and pincount.

ptep_clear_flush() used to clear the PTE and flush the TLB should imply
a memory barrier for flushing the TLB, so don't add another one for now.

PageAnonExclusive handling requires further care and will be handled
separately.

Fixes: 2667f50e8b81 ("mm: introduce a general RCU get_user_pages_fast()")
Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>
---
mm/gup.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

diff --git a/mm/gup.c b/mm/gup.c
index 5abdaf487460..0008b808f484 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2392,6 +2392,14 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
goto pte_unmap;
}

+ /*
+ * Update refcount/pincount before testing for changed PTE. This
+ * is required for code like mm/ksm.c:write_protect_page() that
+ * wants to make sure that a page has no unknown references
+ * after clearing the PTE.
+ */
+ smp_mb__after_atomic();
+
if (unlikely(pte_val(pte) != pte_val(*ptep))) {
gup_put_folio(folio, 1, flags);
goto pte_unmap;
@@ -2577,6 +2585,9 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
if (!folio)
return 0;

+ /* See gup_pte_range(). */
+ smp_mb__after_atomic();
+
if (unlikely(pte_val(pte) != pte_val(*ptep))) {
gup_put_folio(folio, refs, flags);
return 0;
@@ -2643,6 +2654,9 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
if (!folio)
return 0;

+ /* See gup_pte_range(). */
+ smp_mb__after_atomic();
+
if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
gup_put_folio(folio, refs, flags);
return 0;
@@ -2683,6 +2697,9 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
if (!folio)
return 0;

+ /* See gup_pte_range(). */
+ smp_mb__after_atomic();
+
if (unlikely(pud_val(orig) != pud_val(*pudp))) {
gup_put_folio(folio, refs, flags);
return 0;
--
2.37.1




--
Thanks,

David / dhildenb