Re: [PATCH 9/15] sched: Check sched_mn_power_savings when settingflags for CPU and MN domains

From: Gautham R Shenoy
Date: Tue Aug 25 2009 - 05:35:00 EST


On Mon, Aug 24, 2009 at 04:57:42PM +0200, Peter Zijlstra wrote:
> On Thu, 2009-08-20 at 15:40 +0200, Andreas Herrmann wrote:
> > Use new function sd_balance_for_mn_power() and adapt
> > sd_balance_for_package_power() and sd_power_saving_flags() for correct
> > setting of flags SD_POWERSAVINGS_BALANCE and SD_BALANCE_NEWIDLE in CPU
> > and MN domains.
> >
> > Furthermore add flag SD_SHARE_PKG_RESOURCES to MN domain.
> > Rational: a multi-node processor most likely shares package resources
> > (on Magny-Cours the package constitues a "voltage domain").
>
> IIRC SD_SHARE_PKG_RESOURCES plays games with the cpu_pwer of a
> sched_domain, which breaks in all kinds of curious ways, this adds more
> breakage afaict.
>
> ego?

A domain which has SD_SHARE_PKG_RESOURCES, will always have the
__cpu_power = SD_LOAD_SCALE if the domain hasn't set
SD_POWERSAVINGS_BALANCE flag.

The problem which you are talking about is
when you offline a CPU of such a domain, it will still show the same
cpu_power, which can confuse the scheduler.

Eg:
A Dual socket Dual core machine, in the absense of
SD_POWERSAVINGS_BALANCE the SD_LV_CPU which has SD_SHARE_PKG_RESOURCES
set will have both of it's group->cpu_power set to SD_LOAD_SCALE.
If we offline, say one of the four cores,
the group->cpu_power the corresponding group will will still be SD_LOAD_SCALE.

This might affect the fairness calculations. For eg, if you have 6 tasks
running, the ideal placement should have been 4 on the socket whose CPUs
are online and 2 on which one of the cpus has been offlined. But in this
case, we will have 3 + 3, which is not correct.


>
> > Signed-off-by: Andreas Herrmann <andreas.herrmann3@xxxxxxx>
> > ---
> > arch/x86/include/asm/topology.h | 3 ++-
> > include/linux/sched.h | 14 ++++++++++++--
> > 2 files changed, 14 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
> > index 6d7d133..4a520b8 100644
> > --- a/arch/x86/include/asm/topology.h
> > +++ b/arch/x86/include/asm/topology.h
> > @@ -198,7 +198,8 @@ static inline void setup_node_to_cpumask_map(void) { }
> > | SD_BALANCE_EXEC \
> > | SD_WAKE_AFFINE \
> > | SD_WAKE_BALANCE \
> > - | sd_balance_for_package_power()\
> > + | SD_SHARE_PKG_RESOURCES\
> > + | sd_balance_for_mn_power()\
> > | sd_power_saving_flags(),\
> > .last_balance = jiffies, \
> > .balance_interval = 1, \
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 5755643..c53bdd8 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -844,9 +844,18 @@ static inline int sd_balance_for_mc_power(void)
> > return 0;
> > }
> >
> > +static inline int sd_balance_for_mn_power(void)
> > +{
> > + if (sched_mc_power_savings || sched_smt_power_savings)
> > + return SD_POWERSAVINGS_BALANCE;
> > +
> > + return 0;
> > +}
> > +
> > static inline int sd_balance_for_package_power(void)
> > {
> > - if (sched_mc_power_savings | sched_smt_power_savings)
> > + if (sched_mn_power_savings || sched_mc_power_savings ||
> > + sched_smt_power_savings)
> > return SD_POWERSAVINGS_BALANCE;
> >
> > return 0;
> > @@ -860,7 +869,8 @@ static inline int sd_balance_for_package_power(void)
> >
> > static inline int sd_power_saving_flags(void)
> > {
> > - if (sched_mc_power_savings | sched_smt_power_savings)
> > + if (sched_mn_power_savings || sched_mc_power_savings ||
> > + sched_smt_power_savings)
> > return SD_BALANCE_NEWIDLE;
> >
> > return 0;

--
Thanks and Regards
gautham
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/