[RFC PATCH v4 1/7] sched: Framework for sched_mc/smt_power_savings=N

From: Vaidyanathan Srinivasan
Date: Fri Nov 21 2008 - 03:28:48 EST


From: Gautham R Shenoy <ego@xxxxxxxxxx>

*** RFC patch of work in progress and not for inclusion. ***

Currently the sched_mc/smt_power_savings variable is a boolean, which either
enables or disables topology based power savings. This extends the behaviour of
the variable from boolean to multivalued, such that based on the value, we
decide how aggressively do we want to perform topology based powersavings
balance.

Variable levels of power saving tunable would benefit end user to match the
required level of power savings vs performance trade off depending on the
system configuration and workloads.

This initial version makes the sched_mc_power_savings global variable to take
more values (0,1,2).

Later version is expected to add new member sd->powersavings_level at the multi
core CPU level sched_domain. This make all sd->flags check for
SD_POWERSAVINGS_BALANCE into a different macro that will check for
powersavings_level.

The power savings level setting should be in one place either in the
sched_mc_power_savings global variable or contained within the appropriate
sched_domain structure.

Signed-off-by: Gautham R Shenoy <ego@xxxxxxxxxx>
Signed-off-by: Vaidyanathan Srinivasan <svaidy@xxxxxxxxxxxxxxxxxx>
---

include/linux/sched.h | 11 +++++++++++
kernel/sched.c | 16 +++++++++++++---
2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 644ffbd..d862837 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -760,6 +760,17 @@ enum cpu_idle_type {
#define SD_SERIALIZE 1024 /* Only a single load balancing instance */
#define SD_WAKE_IDLE_FAR 2048 /* Gain latency sacrificing cache hit */

+enum powersavings_balance_level {
+ POWERSAVINGS_BALANCE_NONE = 0, /* No power saving load balance */
+ POWERSAVINGS_BALANCE_BASIC, /* Fill one thread/core/package
+ * first for long running threads
+ */
+ POWERSAVINGS_BALANCE_WAKEUP, /* Also bias task wakeups to semi-idle
+ * cpu package for power savings
+ */
+ MAX_POWERSAVINGS_BALANCE_LEVELS
+};
+
#define BALANCE_FOR_MC_POWER \
(sched_smt_power_savings ? SD_POWERSAVINGS_BALANCE : 0)

diff --git a/kernel/sched.c b/kernel/sched.c
index 9b1e793..ea33446 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -7876,14 +7876,24 @@ int arch_reinit_sched_domains(void)
static ssize_t sched_power_savings_store(const char *buf, size_t count, int smt)
{
int ret;
+ unsigned int level = 0;

- if (buf[0] != '0' && buf[0] != '1')
+ sscanf(buf, "%u", &level);
+
+ /*
+ * level is always be positive so don't check for
+ * level < POWERSAVINGS_BALANCE_NONE which is 0
+ * What happens on 0 or 1 byte write,
+ * need to check for count as well?
+ */
+
+ if (level >= MAX_POWERSAVINGS_BALANCE_LEVELS)
return -EINVAL;

if (smt)
- sched_smt_power_savings = (buf[0] == '1');
+ sched_smt_power_savings = level;
else
- sched_mc_power_savings = (buf[0] == '1');
+ sched_mc_power_savings = level;

ret = arch_reinit_sched_domains();


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/