Re: NUMA API observations
From: Andi Kleen
Date: Tue Jun 15 2004 - 08:29:44 EST
Thomas Zehetbauer <thomasz@xxxxxxxxxxxxxx> writes:
> Looking at these numastat results and the default policy it seems that
> memory is primarily allocated on the first node which in turn means a
> unnecessarily large amount of page faults on the second node.
NUMA memory policy has nothing to do with page faults.
If you get most allocations on the first node it either means most
programs run on the first node (assuming they don't use NUMA API
to change their memory affinity) or more likely the programs running
on node 0 need more memory than those running on node 1.
That's easily possible, e.g. a typical desktop uses most of its
memory in the X server. If it runs on node 0 you get such skewed
statistics. On servers it is often similar.
One way to combat that if it was really a problem would be to run the
X server with interleaving policy (numactl --interleave=all
XFree86)[1], but I would recommend careful benchmarks first if it's
really a win. Normally better local memory latency is the better
choice.
[1] Don't do that with startx or xinit, the rest of the X session should
probably not use that.
> I wonder if it is possible to better balance processes among the nodes
> by e.g. setting nodeAffinity = pid mod nodeCount
I assume you mean scheduling not memory affinity here. execve() and
clone() do that kind of (but based on node loads, not pids), but not fork.
-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/