Good stuff, I just gave the patch a spin and things seem a little
calmer. However Im still seeing a lot of balancing going on within a
node.
This is a clearly recognizable edge case, so I'll try drawing this up on
some paper and see if I can suggest another patch. There's no good reason
to move one lone process from a particular processor to another idle one.
But it also approaches a question that's come up before: if you have 2
tasks on processor A and 1 on processor B, do you move one from A to B?
One argument is that the two tasks on A will take twice as long as
the one on B if you do nothing. But another says that bouncing a task
around can't correct the overall imbalance and so is wasteful. I know
of benchmarks where both behaviors are considered important. Thoughts?
It's the classic fairness vs throughput thing we've argued about before.
Most workloads don't have that static a number of processes, but it probably does need to do it if the imbalance is persistent ... but much
more reluctantly than normal balancing. See the patch I sent out a bit
earlier to test it - that may be *too* extreme in the other direction,
but it should confirm what's going on, at least.