Re: [PATCH] mm: numa: return the number of base pages altered byprotection changes

From: Luis Henriques
Date: Wed Dec 04 2013 - 11:52:04 EST


On Tue, Dec 03, 2013 at 03:04:00PM +0000, Mel Gorman wrote:
> commit 72403b4a0fbdf433c1fe0127e49864658f6f6468 upstream.

Thank you Mel, I'll queue this backport for the 3.11 kernel.

Cheers,
--
Luis


>
> Commit 0255d4918480 ("mm: Account for a THP NUMA hinting update as
> one PTE update") was added to account for the number of PTE updates
> when marking pages prot_numa. task_numa_work was using the old
> return value to track how much address space had been updated.
> Altering the return value causes the scanner to do more work than it
> is configured or documented to in a single unit of work.
>
> This patch reverts that commit and accounts for the number of THP
> updates separately in vmstat. It is up to the administrator to
> interpret the pair of values correctly. This is a straight-forward
> operation and likely to only be of interest when actively debugging NUMA
> balancing problems.
>
> The impact of this patch is that the NUMA PTE scanner will scan slower
> when THP is enabled and workloads may converge slower as a result. On
> the flip size system CPU usage should be lower than recent tests
> reported. This is an illustrative example of a short single JVM specjbb
> test
>
> specjbb
> 3.12.0 3.12.0
> vanilla acctupdates
> TPut 1 26143.00 ( 0.00%) 25747.00 ( -1.51%)
> TPut 7 185257.00 ( 0.00%) 183202.00 ( -1.11%)
> TPut 13 329760.00 ( 0.00%) 346577.00 ( 5.10%)
> TPut 19 442502.00 ( 0.00%) 460146.00 ( 3.99%)
> TPut 25 540634.00 ( 0.00%) 549053.00 ( 1.56%)
> TPut 31 512098.00 ( 0.00%) 519611.00 ( 1.47%)
> TPut 37 461276.00 ( 0.00%) 474973.00 ( 2.97%)
> TPut 43 403089.00 ( 0.00%) 414172.00 ( 2.75%)
>
> 3.12.0 3.12.0
> vanillaacctupdates
> User 5169.64 5184.14
> System 100.45 80.02
> Elapsed 252.75 251.85
>
> Performance is similar but note the reduction in system CPU time. While
> this showed a performance gain, it will not be universal but at least
> it'll be behaving as documented. The vmstats are obviously different but
> here is an obvious interpretation of them from mmtests.
>
> 3.12.0 3.12.0
> vanillaacctupdates
> NUMA page range updates 1408326 11043064
> NUMA huge PMD updates 0 21040
> NUMA PTE updates 1408326 291624
>
> "NUMA page range updates" == nr_pte_updates and is the value returned to
> the NUMA pte scanner. NUMA huge PMD updates were the number of THP
> updates which in combination can be used to calculate how many ptes were
> updated from userspace.
>
> Signed-off-by: Mel Gorman <mgorman@xxxxxxx>
> Reported-by: Alex Thorlton <athorlton@xxxxxxx>
> Reviewed-by: Rik van Riel <riel@xxxxxxxxxx>
> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Mel Gorman <mgorman@xxxxxxx>
> ---
> include/linux/vm_event_item.h | 1 +
> mm/mprotect.c | 7 ++++++-
> mm/vmstat.c | 1 +
> 3 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
> index 1855f0a..c557c6d 100644
> --- a/include/linux/vm_event_item.h
> +++ b/include/linux/vm_event_item.h
> @@ -39,6 +39,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
> PAGEOUTRUN, ALLOCSTALL, PGROTATED,
> #ifdef CONFIG_NUMA_BALANCING
> NUMA_PTE_UPDATES,
> + NUMA_HUGE_PTE_UPDATES,
> NUMA_HINT_FAULTS,
> NUMA_HINT_FAULTS_LOCAL,
> NUMA_PAGE_MIGRATE,
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index 412ba2b..6c3f56f 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -138,6 +138,7 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
> pmd_t *pmd;
> unsigned long next;
> unsigned long pages = 0;
> + unsigned long nr_huge_updates = 0;
> bool all_same_node;
>
> pmd = pmd_offset(pud, addr);
> @@ -148,7 +149,8 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
> split_huge_page_pmd(vma, addr, pmd);
> else if (change_huge_pmd(vma, pmd, addr, newprot,
> prot_numa)) {
> - pages++;
> + pages += HPAGE_PMD_NR;
> + nr_huge_updates++;
> continue;
> }
> /* fall through */
> @@ -168,6 +170,9 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
> change_pmd_protnuma(vma->vm_mm, addr, pmd);
> } while (pmd++, addr = next, addr != end);
>
> + if (nr_huge_updates)
> + count_vm_numa_events(NUMA_HUGE_PTE_UPDATES, nr_huge_updates);
> +
> return pages;
> }
>
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 9bb3145..5a442a7 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -812,6 +812,7 @@ const char * const vmstat_text[] = {
>
> #ifdef CONFIG_NUMA_BALANCING
> "numa_pte_updates",
> + "numa_huge_pte_updates",
> "numa_hint_faults",
> "numa_hint_faults_local",
> "numa_pages_migrated",
> --
> To unsubscribe from this list: send the line "unsubscribe stable" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/