RFC: NUMA modifications to cyclictest

From: Clark Williams
Date: Tue Jan 19 2010 - 18:15:03 EST


Lately we've been struggling with some performance issues on high-core
count (>16 cores) NUMA machines with the RT kernel. During the course
of troubleshooting this issue, we tried using the 'numactl' program to
constrain our measurement testing tool (rteval) to a particular memory
node, rather than letting everything float. Doing so showed marked
improvement in both max latency and jitter. While this doesn't solve
our performance problems I thought it might make sense to have a --numa
mode for cylictest that compliments the --smp mode just added.

The big difference here is that when using --numa, each measurement
thread (one per cpu) has it's stack allocated from the memory node
associated with it's cpu. Also, the major data structures for each
thread (parameter block, statistics block and histogram) are allocated
from the appropriate node. This is done with calls into libnuma,
which means this will add a dependency on libnuma.

The intent is to measure latency on a numa system in the same way a
well-written RT application would run on a NUMA machine, that is
minimizing the off-node memory references.

If you're interested in looking at this, please pull the numa branch
from my git repo at:


and let me know if you find bugs or disagree with the approach.


Attachment: signature.asc
Description: PGP signature