By default the application uses malloc() and all available CPUs. This
patch introduces NUMA support which means:
- memory is allocated node local via numa_alloc_local()
- all CPUs of the specified NUMA node are used. This is also true if the
number of threads set is greater than the number of CPUs available on
this node.