[PATCH] Don't touch single threaded PTEs which are on the right node

From: Andi Kleen
Date: Wed Oct 12 2016 - 12:17:06 EST


From: Andi Kleen <ak@xxxxxxxxxxxxxxx>

We had some problems with pages getting unmapped in single threaded
affinitized processes. It was tracked down to NUMA scanning.

In this case it doesn't make any sense to unmap pages if the
process is single threaded and the page is already on the
node the process is running on.

Add a check for this case into the numa protection code,
and skip unmapping if true.

In theory the process could be migrated later, but we
will eventually rescan and unmap and migrate then.

In theory this could be made more fancy: remembering this
state per process or even whole mm. However that would
need extra tracking and be more complicated, and the
simple check seems to work fine so far.

v2: Only do it for private VMAs. Move most of check out of
loop.
Signed-off-by: Andi Kleen <ak@xxxxxxxxxxxxxxx>
---
mm/mprotect.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/mm/mprotect.c b/mm/mprotect.c
index a4830f0325fe..e9473e7e1468 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -68,11 +68,17 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
pte_t *pte, oldpte;
spinlock_t *ptl;
unsigned long pages = 0;
+ int target_node = -1;

pte = lock_pte_protection(vma, pmd, addr, prot_numa, &ptl);
if (!pte)
return 0;

+ if (prot_numa &&
+ !(vma->vm_flags & VM_SHARED) &&
+ atomic_read(&vma->vm_mm->mm_users) == 1)
+ target_node = cpu_to_node(raw_smp_processor_id());
+
arch_enter_lazy_mmu_mode();
do {
oldpte = *pte;
@@ -94,6 +100,13 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
/* Avoid TLB flush if possible */
if (pte_protnone(oldpte))
continue;
+
+ /*
+ * Don't mess with PTEs if page is already on the node
+ * a single-threaded process is running on.
+ */
+ if (target_node == page_to_nid(page))
+ continue;
}

ptent = ptep_modify_prot_start(mm, addr, pte);
--
2.5.5