Re: [RFC PATCH v1 3/4] mm/memory: Use ptep_get_lockless_norecency() for orig_pte

From: David Hildenbrand
Date: Tue Mar 26 2024 - 13:38:14 EST


On 26.03.24 18:27, Ryan Roberts wrote:
On 26/03/2024 17:02, David Hildenbrand wrote:
On 15.02.24 13:17, Ryan Roberts wrote:
Let's convert handle_pte_fault()'s use of ptep_get_lockless() to
ptep_get_lockless_norecency() to save orig_pte.

There are a number of places that follow this model:

     orig_pte = ptep_get_lockless(ptep)
     ...
     <lock>
     if (!pte_same(orig_pte, ptep_get(ptep)))
             // RACE!
     ...
     <unlock>

So we need to be careful to convert all of those to use
pte_same_norecency() so that the access and dirty bits are excluded from
the comparison.

Additionally there are a couple of places that genuinely rely on the
access and dirty bits of orig_pte, but with some careful refactoring, we
can use ptep_get() once we are holding the lock to achieve equivalent
logic.

We really should document that changed behavior somewhere where it can be easily
found: that orig_pte might have incomplete/stale accessed/dirty information.

I could add it to the orig_pte definition in the `struct vm_fault`?



@@ -5343,7 +5356,7 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
                           vmf->address, &vmf->ptl);
          if (unlikely(!vmf->pte))
              return 0;
-        vmf->orig_pte = ptep_get_lockless(vmf->pte);
+        vmf->orig_pte = ptep_get_lockless_norecency(vmf->pte);
          vmf->flags |= FAULT_FLAG_ORIG_PTE_VALID;

          if (pte_none(vmf->orig_pte)) {
@@ -5363,7 +5376,7 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)

      spin_lock(vmf->ptl);
      entry = vmf->orig_pte;
-    if (unlikely(!pte_same(ptep_get(vmf->pte), entry))) {
+    if (unlikely(!pte_same_norecency(ptep_get(vmf->pte), entry))) {
          update_mmu_tlb(vmf->vma, vmf->address, vmf->pte);
          goto unlock;

I was wondering about the following:

Assume the PTE is not dirty.

Thread 1 does

Sorry not sure what threads have to do with this? How is the vmf shared between
threads? What have I misunderstood...

Assume we have a HW that does not have HW-managed access/dirty bits. One that ends up using generic ptep_set_access_flags(). Access/dirty bits are always updated under PT lock.

Then, imagine two threads. One is the the fault path here. another thread performs some other magic that sets the PTE dirty under PTL.



vmf->orig_pte = ptep_get_lockless_norecency(vmf->pte)
/* not dirty */

/* Now, thread 2 ends up setting the PTE dirty under PT lock. */

spin_lock(vmf->ptl);
entry = vmf->orig_pte;
if (unlikely(!pte_same(ptep_get(vmf->pte), entry))) {
    ...
}
...
entry = pte_mkyoung(entry);

Do you mean pte_mkdirty() here? You're talking about dirty everywhere else.

No, that is just thread 1 seeing "oh, nothing to do" and then goes ahead and unconditionally does that in handle_pte_fault().


if (ptep_set_access_flags(vmf->vma, ...)
...
pte_unmap_unlock(vmf->pte, vmf->ptl);


Generic ptep_set_access_flags() will do another pte_same() check and realize
"hey, there was a change!" let's update the PTE!

set_pte_at(vma->vm_mm, address, ptep, entry);

This is called from the generic ptep_set_access_flags() in your example, right?


Yes.


would overwrite the dirty bit set by thread 2.

I'm not really sure what you are getting at... Is your concern that there is a
race where the page could become dirty in the meantime and it now gets lost? I
think that's why arm64 overrides ptep_set_access_flags(); since the hw can
update access/dirty we have to deal with the races.

My concern is that your patch can in subtle ways lead to use losing PTE dirty bits on architectures that don't have the HW-managed dirty bit. They do exist ;)

Arm64 should be fine in that regard.

--
Cheers,

David / dhildenb