Re: [PATCH 2.5.58] new NUMA scheduler: fix

From: Michael Hohnbaum (
Date: Thu Jan 16 2003 - 18:45:40 EST

On Thu, 2003-01-16 at 12:19, Ingo Molnar wrote:
> Ingo
> (*) whether sched_balance_exec() is a high-frequency path or not is up to
> debate. Right now it's not possible to get much more than a couple of
> thousand exec()'s per second on fast CPUs. Hopefully that will change in
> the future though, so exec() events could become really fast. So i'd
> suggest to only do local (ie. SMP-alike) balancing in the exec() path, and
> only do NUMA cross-node balancing with a fixed frequency, from the timer
> tick. But exec()-time is really special, since the user task usually has
> zero cached state at this point, so we _can_ do cheap cross-node balancing
> as well. So it's a boundary thing - probably doing the full-blown
> balancing is the right thing.
The reason for doing load balancing on every exec is that, as you say,
it is cheap to do the balancing at this point - no cache state, minimal
memory allocations. If we did not balance at this point and relied on
balancing from the timer tick, there would be much more movement of
established processes between nodes, which is expensive. Ideally, the
exec balancing is good enough so that on a well functioning system there
is no inter-node balancing taking place at the timer tick. Our testing
has shown that the exec load balance code does a very good job of
spreading processes across nodes, and thus, very little timer tick
balancing. The timer tick internode balancing is there as a safety
valve for those cases where exec balancing is not adequate. Workloads
with long running processes, and workloads with processes that do lots
of forks but not execs, are examples.

An earlier version of Erich's initial load balancer provided for
the option of balancing at either fork or exec, and the capability
of selecting on a per-process basis which to use. That could be
used if workloads that do forks with no execs become a problem on
NUMA boxes.


Michael Hohnbaum 503-578-5486 T/L 775-5486

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to More majordomo info at Please read the FAQ at

This archive was generated by hypermail 2b29 : Thu Jan 23 2003 - 22:00:14 EST