Re: [PATCH RFC] cpuidle: consolidate calls to time capture

From: Daniel Lezcano
Date: Wed Mar 18 2020 - 10:32:38 EST


On 18/03/2020 12:04, Daniel Lezcano wrote:
>
> Hi Rafael,
>
> On 18/03/2020 11:17, Rafael J. Wysocki wrote:
>> On Monday, March 16, 2020 10:08:43 PM CET Daniel Lezcano wrote:
>>> A few years ago, we changed the code in cpuidle to replace ktime_get()
>>> by a local_clock() to get rid of potential seq lock in the path and an
>>> extra latency.
>>>
>>> Meanwhile, the code evolved and we are getting the time in some other
>>> places like the power domain governor and in the future break even
>>> deadline proposal.
>>
>> Hmm?
>>
>> Have any patches been posted for that?
>
> https://lkml.org/lkml/2020/3/11/1113
>
> https://lkml.org/lkml/2020/3/13/466
>
> but there is no consensus yet if that has a benefit or not.
>
>>> Unfortunately, as the time must be compared across the CPU, we have no
>>> other option than using the ktime_get() again. Hopefully, we can
>>> factor out all the calls to local_clock() and ktime_get() into a
>>> single one when the CPU is entering idle as the value will be reuse in
>>> different places.
>>
>> So there are cases in which it is not necessary to synchronize the time
>> between CPUs and those would take the overhead unnecessarily.
>>
>> This change looks premature to me at least.
>
> The idea is to call one time ktime_get() when entering idle and store
> the result in the struct cpuidle_device, so we have the information when
> we entered idle.
>
> Moreover, ktime_get() is called in do_idle() via:
>
> tick_nohz_idle_enter()
> tick_nohz_start_idle()
> ts->idle_entrytime = ktime_get();
>
> This is called at the first loop level. The idle loop is exiting and
> re-entering again without passing through tick_nohz_idle_enter() in the
> second loop level in case of interrupt processing, thus the
> idle_entrytime is not updated and the return of
> tick_nohz_get_sleep_length() will be greater than what is expected.
>
> May be we can consider ktime_get_mono_fast_ns() which is lockless with a
> particular care of the non-monotonic aspect if needed. Given the
> description at [1] the time jump could a few nanoseconds in case of NMI.
>
> The local_clock() can no be inspected across CPUs, the gap is too big
> and continues to increase during system lifetime.

I took the opportunity to measure the duration to a call to ktime_get,
ktime_get_mono_fast_ns and local_clock.

The result is an average of 10000 measurements and an average of 1000
run of those.

The duration is measured with local_clock(), ktime_get() and
ktime_get_mono_fast_ns()

Measurement with local_clock():
-------------------------------

ktime_get():
N min max sum mean stddev
1000 71 168 109052 109.052 13.0278

ktime_get_mono_fast_ns():
N min max sum mean stddev
1000 66 153 101135 101.135 11.9262

local_clock():
N min max sum mean stddev
1000 70 163 106896 106.896 12.8575


Measurement with ktime_get():
-----------------------------

ktime_get():
N min max sum mean stddev
1000 71 124 100465 100.465 10.0272

ktime_get_mono_fast_ns():
N min max sum mean stddev
1000 67 124 94451 94.451 9.67218

local_clock():
N min max sum mean stddev
1000 71 123 99765 99.765 10.0508


Measurement with ktime_get_mono_fast_ns():
------------------------------------------

ktime_get():
N min max sum mean stddev
1000 67 116 87562 87.562 4.38399

ktime_get_mono_fast_ns():
N min max sum mean stddev
1000 62 104 81017 81.017 4.12453

local_clock():
N min max sum mean stddev
1000 65 110 85919 85.919 4.24859



--
<http://www.linaro.org/> Linaro.org â Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog