We have done some performance evaluation with the locktorture moduleI assume x5-4 server has the crossbar topology and its numa diameter is
as well as with several benchmarks from the will-it-scale repo.
The following locktorture results are from an Oracle X5-4 server
(four Intel Xeon E7-8895 v3 @ 2.60GHz sockets with 18 hyperthreaded
cores each). Each number represents an average (over 25 runs) of the
total number of ops (x10^7) reported at the end of each run. The
standard deviation is also reported in (), and in general is about 3%
from the average. The 'stock' kernel is v5.12.0,
1hop, and all tests were done on this kind of symmetrical topology. Am
I right?
┌─┐ ┌─┐
│ ├─────────────────┤ │
└─┤1 1└┬┘
│ 1 1 │
│ 1 1 │
│ 1 1 │
│ 1 │
│ 1 1 │
│ 1 1 │
│ 1 1 │
┌┼┐1 1 ├─┐
│┼┼─────────────────┤ │
└─┘ └─┘
what if the hardware is using the ring topology and other topologies with
2-hops or even 3-hops such as:
┌─┐ ┌─┐
│ ├─────────────────┤ │
└─┤ └┬┘
│ │
│ │
│ │
│ │
│ │
│ │
│ │
┌┤ ├─┐
│┼┬─────────────────┤ │
└─┘ └─┘
or:
┌───┐ ┌───┐ ┌────┐ ┌─────┐
│ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │
├───┼───────┼───┼──────┼────┼──────┼─────┤
│ │ │ │ │ │ │ │
└───┘ └───┘ └────┘ └─────┘
do we need to consider the distances of numa nodes in the secondary
queue? does it still make sense to treat everyone else equal in
secondary queue?