NUMA API observations

From: Anton Blanchard
Date: Mon Jun 14 2004 - 10:43:22 EST



Hi Andi,

I had a chance to test out the NUMA API on ppc64. I started with 64bit
userspace, I'll send on some 32/64bit compat patches shortly.

This machine is weird in that there are lots of nodes with no memory.
Sure I should probably not set those nodes online, but its a test setup
that is good for finding various issues (like node failover when one is
OOM).

As you can see only node 0 and 1 has memory:

# cat /proc/buddyinfo
Node 1, zone DMA 0 53 62 22 1 0 1 1 1 1 0 0 0 1 60
Node 0, zone DMA 136 18 5 1 2 1 0 1 0 1 1 0 0 0 59

# numastat
node7 node6 node5 node4 node3 node2 node1 node0
numa_hit 0 0 0 0 0 0 30903 2170759
numa_miss 0 0 0 0 0 0 0 0
numa_foreign 0 0 0 0 0 0 0 0
interleave_hit 0 0 0 0 0 0 715 835
local_node 0 0 0 0 0 0 28776 2170737
other_node 0 0 0 0 0 0 2127 22

Now if I try and interleave across all, the task gets OOM killed:

# numactl --interleave=all /bin/sh
Killed

in dmesg: VM: killing process sh

It works if I specify the nodes with memory:

# numactl --interleave=0,1 /bin/sh

Is this expected or do we want it to fallback when there is lots of
memory on other nodes?

A similar scenario happens with:

# numactl --preferred=7 /bin/sh
Killed

in dmesg: VM: killing process numactl

The manpage says we should fallback to other nodes when the preferred
node is OOM.

numactl cpu affinity looks broken on big cpumask systems:

# numactl --cpubind=0 /bin/sh
sched_setaffinity: Invalid argument

sched_setaffinity(19470, 64, { 0, 0, 0, 0, 0, 0, 2332313320, 534d50204d6f6e20 }) = -1 EINVAL (Invalid argument)

My kernel is compiled with NR_CPUS=128, the setaffinity syscall must be
called with a bitmap at least as big as the kernels cpumask_t. I will
submit a patch for this shortly.

Next I looked at the numactl --show info:

# numactl --show
policy: default
preferred node: 0
interleavemask:
interleavenode: 0
nodebind: 0 1 2 3
membind: 0 1 2 3 4 5 6 7

Whats the difference between nodebind and membind? Why dont i see all 8
nodes on both of them? I notice if I do membind=all then I only see
the nodes with memory:

# numactl --membind=all --show
policy: bind
preferred node: 0
interleavemask:
interleavenode: 0
nodebind: 0 1 2 3
membind: 0 1

That kind of makes sense, but I dont understand why we have 4 nodes in
the nodebind field. My cpu layout is not contiguous, perhaps thats why
nodebind comes out strange:

processor : 0
processor : 1
processor : 2
processor : 3
processor : 16
processor : 17
processor : 18
processor : 19

Anton
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/