Re: [RFC PATCH v5 3/7] sched: nominate preferred wakeup cpu

From: Balbir Singh
Date: Mon Dec 15 2008 - 01:42:39 EST


* Vaidyanathan Srinivasan <svaidy@xxxxxxxxxxxxxxxxxx> [2008-12-11 23:12:57]:

> When the system utilisation is low and more cpus are idle,
> then the process waking up from sleep should prefer to
> wakeup an idle cpu from semi-idle cpu package (multi core
> package) rather than a completely idle cpu package which
> would waste power.
>
> Use the sched_mc balance logic in find_busiest_group() to
> nominate a preferred wakeup cpu.
>
> This info can be sored in appropriate sched_domain, but
> updating this info in all copies of sched_domain is not
> practical. Hence this information is stored in root_domain
> struct which is one copy per partitioned sched domain.
> The root_domain can be accessed from each cpu's runqueue
> and there is one copy per partitioned sched domain.
>
> Signed-off-by: Vaidyanathan Srinivasan <svaidy@xxxxxxxxxxxxxxxxxx>
> ---
>
> kernel/sched.c | 12 ++++++++++++
> 1 files changed, 12 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 6bea99b..0918677 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -493,6 +493,14 @@ struct root_domain {
> #ifdef CONFIG_SMP
> struct cpupri cpupri;
> #endif
> +#if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT)
> + /*
> + * Preferred wake up cpu nominated by sched_mc balance that will be
> + * used when most cpus are idle in the system indicating overall very
> + * low system utilisation. Triggered at POWERSAVINGS_BALANCE_WAKEUP(2)

Is the root domain good enough?

What is POWERSAVINGS_BALANCE_WAKEUP(2), is it sched_mc == 2?

> + */
> + unsigned int sched_mc_preferred_wakeup_cpu;
> +#endif
> };
>
> /*
> @@ -3407,6 +3415,10 @@ out_balanced:
>
> if (this == group_leader && group_leader != group_min) {
> *imbalance = min_load_per_task;
> + if (sched_mc_power_savings >= POWERSAVINGS_BALANCE_WAKEUP) {

OK, it is :) (for the question above). Where do we utilize the set
sched_mc_preferred_wakeup_cpu?

> + cpu_rq(this_cpu)->rd->sched_mc_preferred_wakeup_cpu =
> + first_cpu(group_leader->cpumask);

Everytime we balance, we keep replacing rd->sched_mc_preferred_wake_up
with group_lead->cpumask? My big concern is that we do this without
checking if the group_leader has sufficient capacity (after it will
pull in tasks since we made the checks for nr_running and capacity).

> + }
> return group_min;
> }
> #endif
>
>

--
Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/