RE: Performance of low-cpu utilisation benchmark regressed severely since 4.6

From: Doug Smythies
Date: Tue Apr 25 2017 - 03:13:52 EST


On 2017.04.24 07:25 Doug wrote:
> On 2017.04.23 18:23 Srinivas Pandruvada wrote:
>> On Mon, 2017-04-24 at 02:59 +0200, Rafael J. Wysocki wrote:
>>> On Sun, Apr 23, 2017 at 5:31 PM, Doug Smythies <dsmythies@xxxxxxxxx> wrote:
>
>>>> It looks like the cost is mostly related to moving the load from
>>>> one CPU to
>>>> another and waiting for the new one to ramp up then.
>> Last time when we analyzed Mel's result last year this was the
>> conclusion. The problem was more apparent on systems with per core P-
>> state.
>
> ?? I have never seen this particular use case before.
> Unless I have looked the wrong thing, Mel's issue last year was a
> different use case.
>
> ...[cut]...
>
>>>>> We can do one more trick I forgot about. Namely, if we are about
>>>>> to increase
>>>>> the P-state, we can jump to the average between the target and
>>>>> the max
>>>>> instead of just the target, like in the appended patch (on top of
>>>>> linux-next).
>>>>>
>>>>> That will make the P-state selection really aggressive, so costly
>>>>> energetically,
>>>>> but it shoud small jumps of the average load above 0 to case big
>>>>> jumps of
>>>>> the target P-state.
>>>> I'm already seeing the energy costs of some of this stuff.
>>>> 3050.2 Seconds.
>>> Is this with or without reducing the sampling interval?
>
> It was without reducing the sample interval.
>
> So, it was the branch you referred us to the other day:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
>
> with your patch (now deleted from this thread) applied.
>
>
> ...[cut]...
>
>>> Anyway, your results are somewhat counter-intuitive.
>
>>> Would it be possible to run this workload with the linux-next branch
>>> and the schedutil governor and see if the patch at
>>> https://patchwork.kernel.org/patch/9671829/ makes any difference?
>
> git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
> Plus that patch is in progress.

3387.76 Seconds.
Idle power 3.85 watts.

Other potentially interesting information for 2 hour idle test:
Driver called 21209 times. Maximum duration 2396 Seconds. Minimum duration 20 mSec.
Histogram of target pstates:
16 8
17 3149
18 1436
19 1479
20 196
21 2
22 3087
23 375
24 22
25 4
26 2
27 3736
28 2177
29 13
30 0
31 0
32 2
33 0
34 1533
35 246
36 0
37 4
38 3738

Compared to kernel 4.11-rc7 (passive mode, schedutil governor)
3297.82 (re-stated from a previous e-mail)
Idle power 3.81 watts

Other potentially interesting information for 2 hour idle test:
Driver called 1631 times. Maximum duration 2510 Seconds. Minimum duration 0.587 mSec.
Histogram of target pstates (missing lines mean 0 occurrences):
16 813
24 2
38 816

... Doug