Re: [PATCH] Don't touch single threaded PTEs which are on the right node

From: Mel Gorman
Date: Thu Oct 13 2016 - 04:39:39 EST


On Wed, Oct 12, 2016 at 09:15:49AM -0700, Andi Kleen wrote:
> From: Andi Kleen <ak@xxxxxxxxxxxxxxx>
>
> We had some problems with pages getting unmapped in single threaded
> affinitized processes. It was tracked down to NUMA scanning.
>
> In this case it doesn't make any sense to unmap pages if the
> process is single threaded and the page is already on the
> node the process is running on.
>
> Add a check for this case into the numa protection code,
> and skip unmapping if true.
>
> In theory the process could be migrated later, but we
> will eventually rescan and unmap and migrate then.
>
> In theory this could be made more fancy: remembering this
> state per process or even whole mm. However that would
> need extra tracking and be more complicated, and the
> simple check seems to work fine so far.
>
> v2: Only do it for private VMAs. Move most of check out of
> loop.
> Signed-off-by: Andi Kleen <ak@xxxxxxxxxxxxxxx>

Minor comments

> ---
> mm/mprotect.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index a4830f0325fe..e9473e7e1468 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -68,11 +68,17 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
> pte_t *pte, oldpte;
> spinlock_t *ptl;
> unsigned long pages = 0;
> + int target_node = -1;
>

Proper convention is to use NUMA_NO_NODE instead of -1 although it's not
always adhered to.

> pte = lock_pte_protection(vma, pmd, addr, prot_numa, &ptl);
> if (!pte)
> return 0;
>
> + if (prot_numa &&
> + !(vma->vm_flags & VM_SHARED) &&
> + atomic_read(&vma->vm_mm->mm_users) == 1)
> + target_node = cpu_to_node(raw_smp_processor_id());
> +

Use numa_node_id() instead of open-coding this. A short comment probably
would not hurt even if git blame should make it obvious.

> arch_enter_lazy_mmu_mode();
> do {
> oldpte = *pte;
> @@ -94,6 +100,13 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
> /* Avoid TLB flush if possible */
> if (pte_protnone(oldpte))
> continue;
> +
> + /*
> + * Don't mess with PTEs if page is already on the node
> + * a single-threaded process is running on.
> + */
> + if (target_node == page_to_nid(page))
> + continue;
> }
>

Check target_node != NUMA_NODE && target_node == page_to_nid(page) to
avoid unnecessary page->flag masking and shifts?

The last one will be fairly marginal, the others are taste so whether
you spin a v3 with the corrections or not;

Acked-by: Mel Gorman <mgorman@xxxxxxx>

Thanks.

--
Mel Gorman
SUSE Labs