Re: [patch] scheduler: rebalance_tick interval update
From: Nick Piggin
Date:  Mon Nov 15 2004 - 21:19:30 EST
Matthew Dobson wrote:
On Mon, 2004-11-15 at 17:17, Nick Piggin wrote:
Darren Hart wrote:
The current rebalance_tick() code assigns each sched_domain's
last_balance field to += interval after performing a load_balance.  If
interval is 10, this has the effect of saying:  we want to run
load_balance at time = 10, 20, 30, 40, etc...  If for example
last_balance=10 and for some reason rebalance_tick can't be run until
30, load_balance will be called and last_balance will be updated to 20,
causing it to call load_balance again immediately the next time it is
called since the interval is 10 and we are already at >30.  It seems to
me that it would make much more sense for last_balance to be assigned
jiffies after a load_balance, then the meaning of last_balance is more
exact: "this domain was last balanced at jiffies" rather than "we last
handled the balance we were supposed to do at 20, at some indeterminate
time".  The following patch makes this change.
Hi Darren,
This is how I first implemented it... but I think this will cause
rebalance points of each processor to tend to become synchronised
(rather than staggered) as ticks get lost.
But isn't that what this is supposed to stop:
       unsigned long j = jiffies + CPU_OFFSET(this_cpu);
....
               if (j - sd->last_balance >= interval) {
                       if (load_balance(this_cpu, this_rq, sd, idle)) {
                               /* We've pulled tasks over so no longer idle */
                               idle = NOT_IDLE;
                       }
                       sd->last_balance += interval;
               }
The CPU_OFFSET() macro is designed to spread out the balancing so they
don't all occur at the same time, no?
Yes, but if you balance n ticks since the last _rebalance_, then things will
be able to drift. Let's say 2 CPUs, they balance at 10 jiffies intervals,
5 jiffies apart:
jiffy   CPU0                              CPU1
0       rebalance (next, 10)
5                                         rebalance (next, 15)
10      rebalance (next, 20)
15                                        rebalance can't be run until
                                         30 as per Darren's example.
20      rebalance (next, 30)
30      rebalance (next, 40)              rebalance (next, 40)
So CPU0 and CPU1 are now synchronised. Having the next balance be calculated
from the _current_ time leaves you open to all sorts of these drift issues.
Another example, in some ticks, a CPU won't see the updated 'jiffies', other
times it will (at least on Altix systems, this can happen).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/