Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency

From: Rafael J. Wysocki
Date: Sun Jun 09 2013 - 16:49:55 EST


On Sunday, June 09, 2013 09:08:23 PM Stratos Karafotis wrote:
> On 06/09/2013 07:26 PM, Borislav Petkov wrote:
> > On Sun, Jun 09, 2013 at 12:18:09AM +0200, Rafael J. Wysocki wrote:
> >> The average power drawn by the package is slightly higher with the
> >> patchset applied (27.66 W vs 27.25 W), but since the time needed to
> >> complete the workload with the patchset applied was shorter by about
> >> 2.3 sec, the total energy used was less in the latter case (by about
> >> 25.7 J if I'm not mistaken, or 1% relative). This means that in the
> >> absence of a power limit between 27.25 W and 27.66 W it's better to
> >> use the kernel with the patchset applied for that particular workload
> >> from the performance and energy usage perspective.
> >>
> >> Good, hopefully that's going to be confirmed on other systems and/or
> >> with other workloads. :-)
> >
> > Yep, I see similar results on my AMD F15h.
> >
> > So there's a register which tells you what the current energy
> > consumption in Watts is and support for it is integrated in lm_sensors.
> > I did one read per second, for the duration of the kernel build (10-r5 +
> > tip), with and without the patch, and averaged out the results:
> >
> > without
> > =======
> >
> > 1. 158 samples, avg Watts: 116.915
> > 2. 158 samples, avg Watts: 116.855
> > 3. 158 samples, avg Watts: 116.737
> > 4. 158 samples, avg Watts: 116.792
> >
> > => 116.82475 avg Watts.
> >
> > with
> > ====
> >
> > 1. 157 samples, avg Watts: 116.496
> > 2. 156 samples, avg Watts: 117.535
> > 3. 156 samples, avg Watts: 118.174
> > 4. 157 samples, avg Watts: 117.95
> >
> > => 117.53875 avg Watts.
> >
> > So there's a slight raise in the average power consumption but the
> > samples count drops by 1 or 2, which is consistent with the observed
> > kernel build speedup of 1 or 2 seconds.
> >
> > perf doesn't show any significant difference with and without the patch
> > but those are single runs only.
> >
> > without
> > =======
> >
> > Performance counter stats for 'make -j9':
> >
> > 1167856.647713 task-clock # 7.272 CPUs utilized
> > 1,071,177 context-switches # 0.917 K/sec
> > 52,844 cpu-migrations # 0.045 K/sec
> > 43,600,721 page-faults # 0.037 M/sec
> > 4,712,068,048,465 cycles # 4.035 GHz
> > 1,181,730,064,794 stalled-cycles-frontend # 25.08% frontend cycles idle
> > 243,576,229,438 stalled-cycles-backend # 5.17% backend cycles idle
> > 2,966,369,010,209 instructions # 0.63 insns per cycle
> > # 0.40 stalled cycles per insn
> > 651,136,706,156 branches # 557.548 M/sec
> > 34,582,447,788 branch-misses # 5.31% of all branches
> >
> > 160.599796045 seconds time elapsed
> >
> > with
> > ====
> >
> > Performance counter stats for 'make -j9':
> >
> > 1169278.095561 task-clock # 7.271 CPUs utilized
> > 1,076,528 context-switches # 0.921 K/sec
> > 53,284 cpu-migrations # 0.046 K/sec
> > 43,598,610 page-faults # 0.037 M/sec
> > 4,721,747,687,668 cycles # 4.038 GHz
> > 1,182,301,583,422 stalled-cycles-frontend # 25.04% frontend cycles idle
> > 248,675,448,161 stalled-cycles-backend # 5.27% backend cycles idle
> > 2,967,419,684,598 instructions # 0.63 insns per cycle
> > # 0.40 stalled cycles per insn
> > 651,527,448,140 branches # 557.205 M/sec
> > 34,560,656,638 branch-misses # 5.30% of all branches
> >
> > 160.811815170 seconds time elapsed
>
> Hi,
>
> Boris, thanks so much for your tests!
>
> Rafael, thanks for your analysis!
>
> I did some additional tests to see how the CPU behaves in it's low and high limits.
>
> I used Phoronix Java SciMark 2.0 test (FFT, Monte Carlo etc) to check the patch in
> really heavy loads. The results were almost identical with and without this patch.
> This is the expected behavior because I believe the load is greater than up_threshold
> most of the time in this cases.
> With this patch.
> Duration: 120.568521 sec
> Pkg_W: 20.97
>
> Without this patch
> Duration: 120.606813 sec
> Pkg_W: 21.11

The kernel with the patch applied still uses slightly less energy, however.

> I also used a small program to check the CPU in very small loads with duration
> comparable to sampling rate (10000 in my config).
> The program uses a tight 'for' loop with duration ~ (2 x sampling_rate).
> After this it sleeps for 5000us.
> I repeat the above for 100 times and then the program sleeps for 1 sec.
> The above procedure repeats 15 times.
>
> Results show that there is a slow down (~4%) WITH this patch.
> Though, less energy used WITH this patch (25,23J ~3.3%)

Well, this means that your changes may hurt performance if the load comes and
goes in spikes, which is not so good. The fact that they cause less energy to
be used at the same time kind of balance that, though. [After all, we're
talking about the ondemand governor which should be used if the user wants to
sacrifice some performance for energy savings.]

It would be interesting to see if the picture changes for different time
intervals in your test program (e.g. loop duration that is not a multiple of
sampling_rate and sleep times different from 5000 us) to rule out any random
coincidences.

Can you possibly prepare a graph showing both the execution time and energy
consumption for several different loop durations in your program (let's keep
the 5000 us sleep for now), including multiples of sampling_rate as well as
some other durations?

Thanks,
Rafael


> WITHOUT patch:
> ----------------
> Starting benchmark
> run 0
> Avg time: 21907 us
> run 1
> Avg time: 21792 us
> run 2
> Avg time: 21827 us
> run 3
> Avg time: 21831 us
> run 4
> Avg time: 21828 us
> run 5
> Avg time: 21838 us
> run 6
> Avg time: 21819 us
> run 7
> Avg time: 21836 us
> run 8
> Avg time: 21761 us
> run 9
> Avg time: 21586 us
> run 10
> Avg time: 20366 us
> run 11
> Avg time: 21732 us
> run 12
> Avg time: 20225 us
> run 13
> Avg time: 21818 us
> run 14
> Avg time: 21812 us
> Elapsed time: 55004.660000 msec
> cor CPU %c0 GHz TSC SMI %c1 %c3 %c6 %c7 CTMP PTMP %pc2 %pc3 %pc6 %pc7 Pkg_W Cor_W GFX_W
> 8.34 3.30 3.39 0 8.78 0.48 82.41 0.00 43 43 0.00 0.00 0.00 0.00 13.87 8.15 0.00
> 0 0 0.28 3.10 3.39 0 0.95 0.26 98.51 0.00 43 43 0.00 0.00 0.00 0.00 13.87 8.15 0.00
> 0 4 0.54 2.97 3.39 0 0.69
> 1 1 0.18 2.15 3.39 0 59.11 0.03 40.67 0.00 39
> 1 5 58.86 3.26 3.39 0 0.43
> 2 2 3.20 3.82 3.39 0 0.28 0.03 96.50 0.00 36
> 2 6 0.13 2.40 3.39 0 3.34
> 3 3 0.47 3.04 3.39 0 4.01 1.58 93.94 0.00 39
> 3 7 3.04 3.73 3.39 0 1.45
> 55.027201 sec
>
>
> WITH patch
> ----------
> Starting benchmark
> run 0
> Avg time: 23198 us
> run 1
> Avg time: 23100 us
> run 2
> Avg time: 23068 us
> run 3
> Avg time: 23101 us
> run 4
> Avg time: 23075 us
> run 5
> Avg time: 23173 us
> run 6
> Avg time: 23151 us
> run 7
> Avg time: 23123 us
> run 8
> Avg time: 23112 us
> run 9
> Avg time: 23157 us
> run 10
> Avg time: 23107 us
> run 11
> Avg time: 23146 us
> run 12
> Avg time: 23067 us
> run 13
> Avg time: 23189 us
> run 14
> Avg time: 23053 us
> Elapsed time: 57288.522000 msec
> cor CPU %c0 GHz TSC SMI %c1 %c3 %c6 %c7 CTMP PTMP %pc2 %pc3 %pc6 %pc7 Pkg_W Cor_W GFX_W
> 7.69 3.03 3.39 0 7.86 0.56 83.89 0.00 44 44 0.00 0.00 0.00 0.00 12.88 7.17 0.00
> 0 0 60.24 3.05 3.39 0 0.34 0.02 39.40 0.00 44 44 0.00 0.00 0.00 0.00 12.88 7.17 0.00
> 0 4 0.11 1.84 3.39 0 60.47
> 1 1 0.22 2.15 3.39 0 0.61 0.04 99.13 0.00 37
> 1 5 0.50 2.53 3.39 0 0.33
> 2 2 0.12 2.12 3.39 0 0.29 0.11 99.48 0.00 34
> 2 6 0.05 2.26 3.39 0 0.36
> 3 3 0.31 2.66 3.39 0 0.08 2.08 97.53 0.00 38
> 3 7 0.03 1.96 3.39 0 0.37
> 57.290084 sec
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/