Re: [PATCH v3] mm/khugepaged: sched to numa node when collapse huge page

From: Peter Xu
Date: Thu Apr 28 2022 - 09:50:28 EST


Hi, Bibo,

On Thu, Mar 17, 2022 at 02:50:24AM -0400, Bibo Mao wrote:
> collapse huge page will copy huge page from general small pages,
> dest node is calculated from most one of source pages, however
> THP daemon is not scheduled on dest node. The performance may be
> poor since huge page copying across nodes, also cache is not used
> for target node. With this patch, khugepaged daemon switches to
> the same numa node with huge page. It saves copying time and makes
> use of local cache better.
>
> With this patch, specint 2006 base performance is improved with 6%
> on Loongson 3C5000L platform with 32 cores and 8 numa nodes.

Totally not familiar with specint, so a pure question is whether it'll make
a real difference in real-world workloads? As I assume in real world the
memory affinity to the processors should change relatively slow on tuned
systems, so even if khugepaged copied a bit slower then it'll not affect
much on the real workload after the movement completes?

The other question is if it makes sense, whether it's applicable to file
thps too (collapse_file)?

Thanks,

>
> Signed-off-by: Bibo Mao <maobibo@xxxxxxxxxxx>
> ---
> changelog:
> V2: remove node record for thp daemon
> V3: remove unlikely statement
> ---
> mm/khugepaged.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 131492fd1148..b3cf0885f5a2 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1066,6 +1066,7 @@ static void collapse_huge_page(struct mm_struct *mm,
> struct vm_area_struct *vma;
> struct mmu_notifier_range range;
> gfp_t gfp;
> + const struct cpumask *cpumask;
>
> VM_BUG_ON(address & ~HPAGE_PMD_MASK);
>
> @@ -1079,6 +1080,13 @@ static void collapse_huge_page(struct mm_struct *mm,
> * that. We will recheck the vma after taking it again in write mode.
> */
> mmap_read_unlock(mm);
> +
> + /* sched to specified node before huage page memory copy */
> + if (task_node(current) != node) {
> + cpumask = cpumask_of_node(node);
> + if (!cpumask_empty(cpumask))
> + set_cpus_allowed_ptr(current, cpumask);
> + }
> new_page = khugepaged_alloc_page(hpage, gfp, node);
> if (!new_page) {
> result = SCAN_ALLOC_HUGE_PAGE_FAIL;

--
Peter Xu