[PATCH v2 0/5] mm: NUMA stats code cleanup and enhancement

From: Kemi Wang
Date: Tue Dec 19 2017 - 01:41:35 EST

The existed implementation of NUMA counters is per logical CPU along with
zone->vm_numa_stat[] separated by zone, plus a global numa counter array
vm_numa_stat[]. However, unlike the other vmstat counters, NUMA stats don't
effect system's decision and are only consumed when reading from /proc and
/sys. Also, usually nodes only have a single zone, except for node 0, and
there isn't really any use where you need these hits counts separated by

Therefore, we can migrate the implementation of numa stats from per-zone to
per-node (as suggested by Andi Kleen), and reuse the existed per-cpu
infrastructure with a little enhancement for NUMA stats. In this way, we
can get rid of the special way for NUMA stats and keep the performance gain
at the same time. With this patch series, about 170 lines code can be

The first patch migrates NUMA stats from per-zone to pre-node using the
existed per-cpu infrastructure. There is a little user-visual change when
read /proc/zoneinfo listed below:
Before After
Node 0, zone DMA Node 0, zone DMA
per-node stats per-node stats
nr_inactive_anon 7244 *numa_hit 98665086*
nr_active_anon 177064 *numa_miss 0*
... *numa_foreign 0*
nr_bounce 0 *numa_interleave 21059*
nr_free_cma 0 *numa_local 98665086*
*numa_hit 0* *numa_other 0*
*numa_miss 0* nr_inactive_anon 20055
*numa_foreign 0* nr_active_anon 389771
*numa_interleave 0* ...
*numa_local 0* nr_bounce 0
*numa_other 0* nr_free_cma 0

The second patch extends the local cpu counter vm_stat_node_diff from s8 to
s16. It does not have any functionality change.

The third patch uses a large and constant threshold size for NUMA counters
to reduce the global NUMA counters update frequency.

The forth patch uses node_page_state_snapshot instead of node_page_state
when query a node stats (e.g. cat /sys/devices/system/node/node*/vmstat).
The only differece is that the stats value in local cpus are also included
in node_page_state_snapshot.

The last patch renames zone_statistics() to numa_statistics().

At last, I want to extend my heartiest appreciation for Michal Hocko's
suggestion of reusing the existed per-cpu infrastructure making it much
better than before.

a) enhance the existed per-cpu infrastructure for node page stats by
entending local cpu counters vm_node_stat_diff from s8 to s16
b) reuse the per-cpu infrastrcuture for NUMA stats

Kemi Wang (5):
mm: migrate NUMA stats from per-zone to per-node
mm: Extends local cpu counter vm_diff_nodestat from s8 to s16
mm: enlarge NUMA counters threshold size
mm: use node_page_state_snapshot to avoid deviation
mm: Rename zone_statistics() to numa_statistics()

drivers/base/node.c | 28 +++----
include/linux/mmzone.h | 31 ++++----
include/linux/vmstat.h | 31 --------
mm/mempolicy.c | 2 +-
mm/page_alloc.c | 22 +++---
mm/vmstat.c | 206 +++++++++----------------------------------------
6 files changed, 74 insertions(+), 246 deletions(-)