Re: [PATCH v2 1/2] sched/numa: skip VMA scanning on memory pinned to one NUMA node via cpuset.mems

From: Libo Chen
Date: Tue Apr 01 2025 - 13:45:34 EST




On 4/1/25 06:23, Mel Gorman wrote:
> On Wed, Mar 26, 2025 at 05:23:51PM -0700, Libo Chen wrote:
>> When the memory of the current task is pinned to one NUMA node by cgroup,
>> there is no point in continuing the rest of VMA scanning and hinting page
>> faults as they will just be overhead. With this change, there will be no
>> more unnecessary PTE updates or page faults in this scenario.
>>
>> We have seen up to a 6x improvement on a typical java workload running on
>> VMs with memory and CPU pinned to one NUMA node via cpuset in a two-socket
>> AARCH64 system. With the same pinning, on a 18-cores-per-socket Intel
>> platform, we have seen 20% improvment in a microbench that creates a
>> 30-vCPU selftest KVM guest with 4GB memory, where each vCPU reads 4KB
>> pages in a fixed number of loops.
>>
>> Signed-off-by: Libo Chen <libo.chen@xxxxxxxxxx>
>> ---
>> kernel/sched/fair.c | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index e43993a4e5807..6f405e00c9c7e 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -3329,6 +3329,13 @@ static void task_numa_work(struct callback_head *work)
>> if (p->flags & PF_EXITING)
>> return;
>>
>> + /*
>> + * Memory is pinned to only one NUMA node via cpuset.mems, naturally
>> + * no page can be migrated.
>> + */
>> + if (nodes_weight(cpuset_current_mems_allowed) == 1)
>> + return;
>> +
>
> Check cpusets_enabled() first?
>

Hi Mel,

Yeah, can add that but isn't a bit redundant since nodes_weight(cpuset_current_mems_allowed) will just return #nodes which doesn't equal to 1 when !cpusets_enabled() and there are >= 2 numa nodes?


Libo