Re: [patch 3/6] sched, nohz: sched group, domain aware nohz idleload balancing

From: Peter Zijlstra
Date: Thu Nov 24 2011 - 06:48:07 EST


On Fri, 2011-11-18 at 15:03 -0800, Suresh Siddha wrote:
> static inline int nohz_kick_needed(struct rq *rq, int cpu)
> {
> unsigned long now = jiffies;
> struct sched_domain *sd;
>
> + if (unlikely(idle_cpu(cpu)))
> + return 0;
> +
> /*
> * We were recently in tickless idle mode. At the first busy tick
> * after returning from idle, we will update the busy stats.
> @@ -5120,36 +5047,43 @@ static inline int nohz_kick_needed(struc
> if (unlikely(test_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu)))) {
> clear_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu));
>
> + cpumask_clear_cpu(cpu, nohz.idle_cpus_mask);
> + atomic_dec(&nohz.nr_cpus);
> +
> for_each_domain(cpu, sd)
> atomic_inc(&sd->groups->sgp->nr_busy_cpus);
> }
>
> + /*
> + * None are in tickless mode and hence no need for NOHZ idle load
> + * balancing.
> + */
> + if (likely(!atomic_read(&nohz.nr_cpus)))
> return 0;
>
> + if (time_before(now, nohz.next_balance))
> return 0;
>
> + if (rq->nr_running >= 2)
> + goto need_kick;
>
> + for_each_domain(cpu, sd) {
> + struct sched_group *sg = sd->groups;
> + struct sched_group_power *sgp = sg->sgp;
> + int nr_busy = atomic_read(&sgp->nr_busy_cpus);
> +
> + if (nr_busy > 1 && (nr_busy * SCHED_LOAD_SCALE > sgp->power))
> + goto need_kick;

This looks wrong, its basically always true for a box with HT.

sgp->power is a measure of how much compute power this group has, its
basic form is sg->weight * SCHED_POWER_SCALE and is reduced from there;
HT siblings get less since they're not as powerful as two actual cores
and we deduct time spend on RT-tasks and IRQs etc..

So how does comparing the load of non-nohz cpus to that make sense?

> +
> + if (sd->flags & SD_ASYM_PACKING && nr_busy != sg->group_weight
> + && (cpumask_first_and(nohz.idle_cpus_mask,
> + sched_domain_span(sd)) < cpu))
> + goto need_kick;
> }
> +
> return 0;
> +need_kick:
> + return 1;
> }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/