I can confirm single JVM JBB is working well for me. I see a 30%
improvement over autoNUMA. What I can't make sense of is some perf
stats (taken at 80 warehouses on 4 x WST-EX, 512GB memory):
tips numa/core:
5,429,632,865 node-loads
3,806,419,082 node-load-misses(70.1%)
2,486,756,884 node-stores
2,042,557,277 node-store-misses(82.1%)
2,878,655,372 node-prefetches
2,201,441,900 node-prefetch-misses
autoNUMA:
4,538,975,144 node-loads
2,666,374,830 node-load-misses(58.7%)
2,148,950,354 node-stores
1,682,942,931 node-store-misses(78.3%)
2,191,139,475 node-prefetches
1,633,752,109 node-prefetch-misses
The percentage of misses is higher for numa/core. I would have expected
the performance increase be due to lower "node-misses", but perhaps I am
misinterpreting the perf data.
Next I'll work on making multi-JVM more of an improvement, and
I'll also address any incoming regression reports.
I have issues with multiple KVM VMs running either JBB or
dbench-in-tmpfs, and I suspect whatever I am seeing is similar to
whatever multi-jvm in baremetal is. What I typically see is no real
convergence of a single node for resource usage for any of the VMs. For
example, when running 8 VMs, 10 vCPUs each, a VM may have the following
resource usage: