Re: [PATCH] cpuidle/menu: Fail cpuidle_idle_call() if no idle stateis acceptable

From: Preeti U Murthy
Date: Tue Jan 14 2014 - 03:28:45 EST


Hi Srivatsa,

On 01/14/2014 12:30 PM, Srivatsa S. Bhat wrote:
> On 01/14/2014 11:35 AM, Preeti U Murthy wrote:
>> On PowerPC, in a particular test scenario, all the cpu idle states were disabled.
>> Inspite of this it was observed that the idle state count of the shallowest
>> idle state, snooze, was increasing.
>>
>> This is because the governor returns the idle state index as 0 even in
>> scenarios when no idle state can be chosen. These scenarios could be when the
>> latency requirement is 0 or as mentioned above when the user wants to disable
>> certain cpu idle states at runtime. In the latter case, its possible that no
>> cpu idle state is valid because the suitable states were disabled
>> and the rest did not match the menu governor criteria to be chosen as the
>> next idle state.
>>
>> This patch adds the code to indicate that a valid cpu idle state could not be
>> chosen by the menu governor and reports back to arch so that it can take some
>> default action.
>>
>
> That sounds fair enough. However, the "default" action of pseries idle loop
> (pseries_lpar_idle()) surprises me. It enters Cede, which is _deeper_ than doing
> a snooze! IOW, a user might "disable" cpuidle or set the PM_QOS_CPU_DMA_LATENCY
> to 0 hoping to prevent the CPUs from going to deep idle states, but then the
> machine would still end up going to Cede, even though that wont get reflected
> in the idle state counts. IMHO that scenario needs some thought as well...

Yes I did see this, but since the patch intends to only communicate
whether the cpuidle governor was successful in choosing an idle state on
its part, I wished to address the default action of pseries idle loop
separately. You are right we will need to understand the patch which
introduced this action. I will take a look at it.

>
>> Signed-off-by: Preeti U Murthy <preeti@xxxxxxxxxxxxxxxxxx>
>> ---
>>
>> drivers/cpuidle/cpuidle.c | 6 +++++-
>> drivers/cpuidle/governors/menu.c | 7 ++++---
>> 2 files changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
>> index a55e68f..5bf06bb 100644
>> --- a/drivers/cpuidle/cpuidle.c
>> +++ b/drivers/cpuidle/cpuidle.c
>> @@ -131,8 +131,9 @@ int cpuidle_idle_call(void)
>>
>> /* ask the governor for the next state */
>> next_state = cpuidle_curr_governor->select(drv, dev);
>> +
>> + dev->last_residency = 0;
>> if (need_resched()) {
>> - dev->last_residency = 0;
>> /* give the governor an opportunity to reflect on the outcome */
>> if (cpuidle_curr_governor->reflect)
>> cpuidle_curr_governor->reflect(dev, next_state);
>
> The comments on top of the .reflect() routines of the governors say that the
> second parameter is the index of the actual state entered. But after this patch,
> next_state can be negative, indicating an invalid index. So those comments need
> to be updated accordingly.

Right, I will take care of the comment in the next post.
>
>> @@ -140,6 +141,9 @@ int cpuidle_idle_call(void)
>> return 0;
>> }
>>
>> + if (next_state < 0)
>> + return -EINVAL;
>
> The exit path above (due to need_resched) returns with irqs enabled, but the new
> one you are adding (next_state < 0) returns with irqs disabled. This is correct,
> because in the latter case, "idle" is still in progress and the arch will choose
> a default handler to execute (unlike the former case where "idle" is over and
> hence its time to enable interrupts).

Correct.
>
> IMHO it would be good to add comments around this code to explain this subtle
> difference. We can never be too careful with these things... ;-)

Ok, will do so.
>
>> +
>> trace_cpu_idle_rcuidle(next_state, dev->cpu);
>>
>> broadcast = !!(drv->states[next_state].flags & CPUIDLE_FLAG_TIMER_STOP);
>> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
>> index cf7f2f0..6921543 100644
>> --- a/drivers/cpuidle/governors/menu.c
>> +++ b/drivers/cpuidle/governors/menu.c
>> @@ -283,6 +283,7 @@ again:
>> * menu_select - selects the next idle state to enter
>> * @drv: cpuidle driver containing state data
>> * @dev: the CPU
>> + * Returns -1 when no idle state is suitable
>> */
>> static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>> {
>> @@ -292,17 +293,17 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>> int multiplier;
>> struct timespec t;
>>
>> - if (data->needs_update) {
>> + if (data->last_state_idx >= 0 && data->needs_update) {
> ^^^^^
> Doesn't hurt, but actually unnecessary, since ->needs_update is set to 1
> only when index >= 0.

Right we do not need this check. I was assuming that needs_update would
be consistent with the index >= 0 only in the need_resched() case. But
needs_update will get unset each time the governor is invoked to be set
only if index >= 0 thereafter.

>
>> menu_update(drv, dev);
>> data->needs_update = 0;
>> }
>>
>> - data->last_state_idx = 0;
>> + data->last_state_idx = -1;
>> data->exit_us = 0;
>>
>> /* Special case when user has set very strict latency requirement */
>> if (unlikely(latency_req == 0))
>> - return 0;
>> + return data->last_state_idx;
>>
>> /* determine the expected residency time, round up */
>> t = ktime_to_timespec(tick_nohz_get_sleep_length());
>>
>
> What about the ladder governor? I know its not used that much in practice,
> but I think it would be good to update that as well, just to keep it
> consistent.

Yes this needs to be updated as well. But the ladder governor has a few
other details to take care of in addition to what is taken care of in
the menu governor by this patch. Hence I will be posting that separately.

Thanks

Regards
Preeti U Murthy
>
> Regards,
> Srivatsa S. Bhat
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/