If CPU 8 has 2 tasks, and cpu 1 has 1 task, there's an imbalance of 1.Yes, although as long as it's node local and happens a couple of
*If* that imbalance persists (and it probably won't, given tasks being
created, destroyed, and blocking for IO), we may want to rotate that to 1 vs 2, and then back to 2 vs 1, etc. in the interests of fairness,
even though it's slower throughput overall.
times a second you should be pretty hard pressed noticing the
difference.
Not sure how true that turns out to be in practice ... probably depends
heavily on both the workload (how heavily it's using the cache) and the
chip (larger caches have proportionately more to lose).
As we go forward in time, cache warmth gets increasingly important, as
CPUs accelerate speeds quicker than memory. Cache sizes also get larger.
I'd really like us to be conservative here - the unfairness thing is really hard to hit anyway - you need a static number of processes that
don't ever block on IO or anything.