RE: [RFC/RFT][PATCH v8] cpuidle: New timer events oriented governor for tickless systems

From: Doug Smythies
Date: Thu Oct 10 2019 - 03:05:21 EST


On 2019.10.09 06:37 Rafael J. Wysocki wrote:
> On Wednesday, October 9, 2019 1:19:51 AM CEST Rafael J. Wysocki wrote:
>> On Tuesday, October 8, 2019 12:49:01 PM CEST Rafael J. Wysocki wrote:
>>> On Tue, Oct 8, 2019 at 11:51 AM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
>>>> On Tue, Oct 8, 2019 at 8:20 AM Doug Smythies <dsmythies@xxxxxxxxx> wrote:
>>>>> O.K. Thanks for your quick reply, and insight.
>>>>>
>>>>> I think long durations always need to be counted, but currently if
>>>>> the deepest idle state is disabled, they are not.
...
>>>> AFAICS, adding early_hits to count is not a mistake if there are still
>>>> enabled states deeper than the current one.
>>>
>>> And the mistake appears to be that the "hits" and "misses" metrics
>>> aren't handled in analogy with the "early_hits" one when the current
>>> state is disabled.

I only know how to exploit and test the "hits" and "misses" path
that should use the deepest available idle state upon transition
to an idle system. Even so, the test has a low probability of
failing, and so needs to be run many times.

I do not know how to demonstrate and/or test any "early_hits" path
to confirm that an issue exists or that it is fixed.

>>>
>>> Let me try to cut a patch to address that.
>>
>> Appended below, not tested.

Reference as: rjw1

>>
>> It is meant to address two problems, one of which is that the "hits" and
>> "misses" metrics of disabled states need to be taken into account too in
>> some cases, and the other is an issue with the handling of "early hits"
>> which may lead to suboptimal state selection if some states are disabled.
>
> Well, it still misses a couple of points.
>
> First, disable states that are too deep should not be taken into consideration
> at all.
>
> Second, the "hits" and "misses" metrics of disabled states need to be used for
> idle duration ranges corresponding to them regardless of whether or not the
> "hits" value is greater than the "misses" one.
>
> Updated patch is below (still not tested), but it tries to do too much in one
> go, so I need to split it into a series of smaller changes.

Thanks for your continued look at this.

Reference as: rjw2

Test 1, hack job statistical test (old tests re-stated):

Kernel tests fail rate
5.4-rc1 6616 13.45%
5.3 2376 4.50%
5.3-teov7 12136 0.00% <<< teo.c reverted and teov7 put in its place.
5.4-rc1-ds 11168 0.00% <<< [old] ds proposed patch (> 7 hours test time)
5.4-rc1-ds12 4224 0.00% <<< [old] new ds proposed patch
5.4-rc2-rjw1 11280 0.00%
5.4-rc2-rjw2 640 0.00% <<< Will be run again, for longer.

Test 2: I also looked at every possible enable/disable idle combination,
and they all seemed O.K.

No other tests have been run yet.

System:
Processor: i7-2600K
Deepest idle state: 4 (C6)

... Doug