On Monday 20 January 2003 17:56, Ingo Molnar wrote:
> On Mon, 20 Jan 2003, Erich Focht wrote:
> > Could you please explain your idea? As far as I understand, the SMP
> > balancer (pre-NUMA) tries a global rebalance at each call. Maybe you
> > mean something different...
> yes, but eg. in the idle-rebalance case we are more agressive at moving
> tasks across SMP CPUs. We could perhaps do a similar ->nr_balanced logic
> to do this 'agressive' balancing even if not triggered from the
> CPU-will-be-idle path. Ie. _perhaps_ the SMP balancer could become a bit
> more agressive.
Do you mean: make the SMP balancer more aggressive by lowering the
> ie. SMP is just the first level in the cache-hierarchy, NUMA is the second
> level. (lets hope we dont have to deal with a third caching level anytime
> soon - although that could as well happen once SMT CPUs start doing NUMA.)
> There's no real reason to do balancing in a different way on each level -
> the weight might be different, but the core logic should be synced up.
> (one thing that is indeed different for the NUMA step is locality of
> uncached memory.)
We have an IA64 2-level node hierarchy machine with 32 CPUs (NEC
TX7). In the "old" node affine scheduler patch the multilevel feature
was in by different cross-node steal delays (longer if node is further
away). In the current approach we could just add another counter, such
that we call the cross-supernode balancer only if the intra-supernode
balancer failed a few times. No idea whether this helps...
> > Yes! Actually the currently implemented nr_balanced logic is pretty
> > dumb: the counter reaches the cross-node balance threshold after a
> > certain number of calls to intra-node lb, no matter whether these were
> > successfull or not. I'd like to try incrementing the counter only on
> > unsuccessfull load balances, this would give a clear priority to
> > intra-node balancing and a clear and controllable delay for cross-node
> > balancing. A tiny patch for this (for 2.5.59) is attached. As the name
> > nr_balanced would be misleading for this kind of usage, I renamed it to
> > nr_lb_failed.
> indeed this approach makes much more sense than the simple ->nr_balanced
> counter. A similar approach makes sense on the SMP level as well: if the
> current 'busy' rebalancer fails to get a new task, we can try the current
> 'idle' rebalancer. Ie. a CPU going idle would do the less intrusive
> rebalancing first.
> have you experimented with making the counter limit == 1 actually? Ie.
> immediately trying to do a global balancing once the less intrusive
> balancing fails?
Didn't have time to try and probably won't be able to check this
before the beginning of next week :-( .
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to firstname.lastname@example.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Thu Jan 23 2003 - 22:00:27 EST