Re: [PATCH] sched/numa: Add statistics of numa balance task migration and swap

From: K Prateek Nayak
Date: Wed Apr 02 2025 - 13:35:44 EST


Hello Chenyu,

On 4/2/2025 6:36 AM, Chen Yu wrote:
On system with NUMA balancing enabled, it is found that tracking
the task activities due to NUMA balancing is helpful. NUMA balancing
has two mechanisms for task migration: one is to migrate the task to
an idle CPU in its preferred node, the other is to swap tasks on
different nodes if they are on each other's preferred node.

The kernel already has NUMA page migration statistics in
/sys/fs/cgroup/mytest/memory.stat and /proc/{PID}/sched.
but does not have statistics for task migration/swap.
Add the task migration and swap count accordingly.

The following two new fields:

numa_task_migrated
numa_task_swapped

will be displayed in both
/sys/fs/cgroup/{GROUP}/memory.stat and /proc/{PID}/sched

Running sched-messaging with schedstats enabled, I could see both
"numa_task_migrated" and "numa_task_swapped" being populated for the
sched-messaging threads:

$ for i in $(ls /proc/4030/task/); do grep "numa_task_migrated" /proc/$i/sched; done | tr -s ' ' | cut -d ' ' -f3 | sort | uniq -c
400 0
231 1
10 2

$ for i in $(ls /proc/4030/task/); do grep "numa_task_swapped" /proc/$i/sched; done | tr -s ' ' | cut -d ' ' -f3 | sort | uniq -c
389 0
193 1
47 2
11 3
1 4


Previous RFC version can be found here:
https://lore.kernel.org/lkml/1847c5ef828ad4835a35e3a54b88d2e13bce0eea.1740483690.git.yu.c.chen@xxxxxxxxx/

Signed-off-by: Chen Yu <yu.c.chen@xxxxxxxxx>

Feel free to add:

Tested-by: K Prateek Nayak <kprateek.nayak@xxxxxxx>

--
Thanks and Regards,
Prateek

---
RFC->v1: Rename the nr_numa_task_migrated to
numa_task_migrated, and nr_numa_task_swapped
numa_task_swapped in /proc/{PID}/sched,
so both cgroup's memory.stat and task's
sched have the same field name.


[..snip..]