Re: [PATCH 6/6] cpuidle: teo: Don't count non-existent intercepts

From: Christian Loehle
Date: Mon Jun 10 2024 - 07:06:25 EST


On 6/7/24 11:17, Dietmar Eggemann wrote:
> On 06/06/2024 11:00, Christian Loehle wrote:
>> When bailing out early, teo will not query the sleep length anymore
>> since commit 6da8f9ba5a87 ("cpuidle: teo:
>> Skip tick_nohz_get_sleep_length() call in some cases") with an
>> expected sleep_length_ns value of KTIME_MAX.
>> This lead to state0 accumulating lots of 'intercepts' because
>> the actually measured sleep length was < KTIME_MAX, so count KTIME_MAX
>> as a hit (we have to count them as something otherwise we are stuck).
>>
>> Fundamentally we can only do one of the two:
>> 1. Skip sleep_length_ns query when we think intercept is likely
>> 2. Have accurate data if sleep_length_ns is actually intercepted when
>> we believe it is currently intercepted.
>>
>> This patch chooses that latter as I've found the additional time it
>> takes to query the sleep length to be negligible and the variants of
>> option 1 (count all unknowns as misses or count all unknown as hits)
>> had significant regressions (as misses had lots of too shallow idle
>> state selections and as hits had terrible performance in
>> intercept-heavy workloads).
>
> So '2.' is the 'if (prev_intercept_idx != idx && !idx)' case ?
>
> [...]

Yes, we allow the logic to bail out early, but not without querying the
expected sleep length.
(For idx > 0 the logic will continue to query the expected sleep length
later on.)

>
>> @@ -514,6 +521,14 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>> first_suitable_idx = i;
>> }
>> }
>> + if (prev_intercept_idx != idx && !idx) {
>
> if (!idx && prev_intercept_idx) ?
>

Thanks! I picked that up for the next version.

>> + /*
>> + * We have to query the sleep length here otherwise we don't
>> + * know after wakeup if our guess was correct.
>> + */
>> + duration_ns = tick_nohz_get_sleep_length(&delta_tick);
>> + cpu_data->sleep_length_ns = duration_ns;
>> + }
>>
>> /*
>> * If there is a latency constraint, it may be necessary to select an
>