Re: [REGRESSION 2.6.30][PATCH v3] sched: update load count onlyonce per cpu in 10 tick update window

From: Peter Zijlstra
Date: Mon Apr 19 2010 - 16:52:50 EST


(stripped the ubuntu kernel-team list, since it is generating bounces
for each email)

On Mon, 2010-04-19 at 13:16 -0700, Chase Douglas wrote:

> > Also, since its all NO_HZ, why not stick this in with the ILB? Once
> > people get around to making that scale better, this can hitch a ride.
> >
> > Something like the below perhaps? It does run partially from softirq
> > context, but since there's a distinct lack of synchronization here that
> > didn't seem like an immediate problem.
>
> I understand everything until you move the calc_load_account_active
> call to run_rebalance_domains. I take it that when CPUs go NO_HZ idle,
> at least one cpu is left to monitor and perform updates as necessary.

Right, that is the idea.

> Conceptually, it makes sense that this cpu should be handling the load
> accounting updates. However, I'm new to this code, so I'm having a
> hard time understanding all the cases and timings for when the
> scheduler softirq is called. Is it guaranteed to be called during
> every 10 tick load update window? If not, then we'll have the issue
> where a NO_HZ idle cpu won't be updated to 0 running tasks in time for
> the load avg calculation.

Ah, I overlooked that trigger_load_balance() already has a jiffy delay.
I was ass-uming we triggered the softirq on each tick.

Yes, that needs a bit of a fix to get called at least every 10 ticks,
looking at rebalance_domain() that can end up being 60s.

> Would someone be able to explain how we are guaranted of the correct
> timing for this path?
>
> I also have a concern with run_rebalance_domains: If the designated
> no_hz.load_balancer cpu wasn't idle at the last tick or needs
> rescheduling, load accounting won't occur for idle cpus. Is it
> possible for this to occur every time when called in the 10 tick
> update window?

Right, so I didn't look too closely either, but was more or less going
for the structure than the details. I haven't read through the ILB stuff
in a while.

>From what I can quickly see, trigger_load_balance() will check if the
current cpu is idle_at_tick, if not it will nominate another cpu to be
ilb -- so I guess it neatly fits together, the !idle cpus fend for
themselves and the idle ones get sorted by the ILB.

Alternatively you could add the calc_load_tasks_deferred thing to the
nohz structure and do it all from trigger_load_balance.

Either approach would work if the ILB were extended to be per node or
something like that (Venki used to work on that, not sure what happened
to that).

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/