Re: [PATCH v2] cpufreq: intel_pstate: Change the calculation of next pstate

From: Dirk Brandewie
Date: Thu May 01 2014 - 17:30:50 EST


On 05/01/2014 02:00 PM, Stratos Karafotis wrote:
Currently the driver calculates the next pstate proportional to
core_busy factor, scaled by the ratio max_pstate / current_pstate.

Using the scaled load (core_busy) to calculate the next pstate
is not always correct, because there are cases that the load is
independent from current pstate. For example, a tight 'for' loop
through many sampling intervals will cause a load of 100% in
every pstate.

So, change the above method and calculate the next pstate with
the assumption that the next pstate should not depend on the
current pstate. The next pstate should only be directly
proportional to measured load.

Tested on Intel i7-3770 CPU @ 3.40GHz.
Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
increase ~1.5% in performance. Below the test results using turbostat
(5 iterations):

Without patch:

Ph. avg Time Total time PkgWatt Total Energy
79.63 266.416 57.74 15382.85984
79.63 265.609 57.87 15370.79283
79.57 266.994 57.54 15362.83476
79.53 265.304 57.83 15342.53032
79.71 265.977 57.76 15362.83152
avg 79.61 266.06 57.74 15364.36985

With patch:

Ph. avg Time Total time PkgWatt Total Energy
78.23 258.826 59.14 15306.96964
78.41 259.110 59.15 15326.35650
78.40 258.530 59.26 15320.48780
78.46 258.673 59.20 15313.44160
78.19 259.075 59.16 15326.87700
avg 78.34 258.842 59.18 15318.82650

The total test time reduced by ~2.6%, while the total energy
consumption during a test iteration reduced by ~0.35%

Signed-off-by: Stratos Karafotis <stratosk@xxxxxxxxxxxx>
---

Changes v1 -> v2
- Enhance change log as Rafael and Viresh suggested


drivers/cpufreq/intel_pstate.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 0999673..8e309db 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -608,28 +608,27 @@ static inline void intel_pstate_set_sample_time(struct cpudata *cpu)
mod_timer_pinned(&cpu->timer, jiffies + delay);
}

-static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu)
+static inline int32_t intel_pstate_get_busy(struct cpudata *cpu)
{
- int32_t core_busy, max_pstate, current_pstate;
+ int32_t core_busy, max_pstate;

core_busy = cpu->sample.core_pct_busy;
max_pstate = int_tofp(cpu->pstate.max_pstate);
- current_pstate = int_tofp(cpu->pstate.current_pstate);
- core_busy = mul_fp(core_busy, div_fp(max_pstate, current_pstate));
+ core_busy = mul_fp(core_busy, max_pstate);

NAK, The goal of this code is to find out how busy the core is at the current
P state. This change will return a value WAY too high.

Assume core_busy is 100 and the max non-turbo P state is 34 (3.4GHz) this code
would return a busy value of 3400. The PID is trying to keep the busy value
at the setpoint any value of ~3% will drive the P state to the highest turbo
P state in this example.


return FP_ROUNDUP(core_busy);
}

static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
{
- int32_t busy_scaled;
+ int32_t busy;
struct _pid *pid;
signed int ctl = 0;
int steps;

pid = &cpu->pid;
- busy_scaled = intel_pstate_get_scaled_busy(cpu);
+ busy = intel_pstate_get_busy(cpu);

- ctl = pid_calc(pid, busy_scaled);
+ ctl = pid_calc(pid, busy);

steps = abs(ctl);

@@ -651,7 +650,7 @@ static void intel_pstate_timer_func(unsigned long __data)
intel_pstate_adjust_busy_pstate(cpu);

trace_pstate_sample(fp_toint(sample->core_pct_busy),
- fp_toint(intel_pstate_get_scaled_busy(cpu)),
+ fp_toint(intel_pstate_get_busy(cpu)),
cpu->pstate.current_pstate,
sample->mperf,
sample->aperf,


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/