Re: [PATCH v2] cpuidle: Add 'above' and 'below' idle state metrics

From: Daniel Lezcano
Date: Mon Jan 14 2019 - 05:39:16 EST



Hi Rafael,

sorry for the delay.

On 10/01/2019 11:20, Rafael J. Wysocki wrote:

[ ... ]

>>> if (entered_state >= 0) {
>>> + s64 diff, delay = drv->states[entered_state].exit_latency;
>>> + int i;
>>> +
>>> /*
>>> * Update cpuidle counters
>>> * This can be moved to within driver enter routine,
>>> @@ -260,6 +262,33 @@ int cpuidle_enter_state(struct cpuidle_d
>>> dev->last_residency = (int)diff;
>>
>> Shouldn't we subtract the 'delay' from the computed 'diff' in any case ?
>
> No.
>
>> Otherwise the 'last_residency' accumulates the effective sleep time and
>> the time to wakeup. We are interested in the sleep time only for
>> prediction and metrics no ?
>
> Yes, but 'delay' is the worst-case latency and not the actual one
> experienced, most of the time, and (on average) we would underestimate
> the sleep time if it was always subtracted.

IMO, the exit latency is more or less constant for the cpu power down
state. When it is the cluster power down state, the first cpu waking up
has the worst latency, then the others have the same has the cpu power
down state.

If we can model that, the gray area you mention below can be reduced.
There are platform where the exit latency is very high [1] and not
taking it into account will give very wrong metrics.

> The idea here is to only count the wakeup as 'above' if the total
> 'last_residency' is below the target residency of the idle state that
> was asked for (as in that case we know for certain that the CPU has
> been woken up too early) and to only count it as 'below' if the
> difference between 'last_residency' and 'delay' is greater than or
> equal to the target residency of a deeper idle state (as in that case
> we know for certain that the CPU has been woken up too late).
>
> Of course, this means that there is a "gray area" in which we are not
> really sure if the sleep time has matched the idle state that was
> asked for, but there's not much we can do about that IMO.

There is another aspect of the metric which can be improved, the 'above'
and the 'below' give an rough indication about the correctness of the
prediction but they don't tell us which idle state we should have
selected (we can be constantly choosing state3 instead of state1 for
example).

It would be nice to add a 'missed' field for each idle states, so when
we check if there is a 'above' or a 'below' condition, we increment the
idle state 'missed' field for the idle state we should have selected.

-- Daniel

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/boot/dts/hisilicon/hi3660.dtsi#n199


--
<http://www.linaro.org/> Linaro.org â Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog