Sure, I have a 2-socket server with 16 threads each. I take one CPU
offline in socket 2, so I've 16 threads on socket 1, 15 in socket 2. In
total, 31 threads so requesting 31 vectors.
Currently, vecs_per_node is calculated in the first iteration as 31 / 2, so 15.
ncpus of socket 1 is 16. cpus_per_vec = 16 / 15, so 1 CPU per vector
with one extra.
When iterating the second socket, though, vecs_per_node is incremented
from 15 to 16 (to account for the "extra" from before). However, the
ncpus is only 15, so that iteration calculates:
cpus_per_vec = 15 / 16
And since that's zero, the remaining 16 vectors are not assigned to any
CPU, and the second socket has no vectors assigned to their CPUs.