On Mon, Mar 07, 2005 at 07:28:23PM +1100, Nick Piggin wrote:
Siddha, Suresh B wrote:
We are resetting the nr_balance_failed to cache_nice_tries after kicking active balancing. But can_migrate_task will succeed only if
nr_balance_failed > cache_nice_tries.
It is indeed, thanks for catching that. We should probably make it
reset the count to the point where it will start moving cache hot
tasks (ie. cache_nice_tries+1).
That still might not be enough. We probably need to pass push_cpu's
sd to move_tasks call in active_load_balance, instead of current busiest_cpu's
sd. Just like push_cpu, we need to add one more field to the runqueue which will specify the domain level of the push_cpu at which we have an imbalance.
Ah yep, right you are there, too. I obviously hadn't looked closely
enough at the recent active_load_balance patches that had gone in :(
What should probably do is heed the "push_cpu" prescription (push_cpu
is now unused).
push_cpu might not be the ideal destination in all cases. Take a NUMA domain
above SMT+SMP domains in my above example. Assume P0, P1 is in node-0 and
P2, P3 in node-1. Assume Loads of P0,P1,P2 are same as the above example,with P3
containing one process load. Now any idle thread in P2 or P3 can trigger
active load balance on P0. We should be selecting thread in P2 ideally
(currently this is what we get with idle package check). But with push_cpu,
we might move to the idle thread in P3 and then finally move to P2(it will be a
two step process)