Re: [PATCH V3 1/6] sched: idle: Add a weak arch_cpu_idle_poll function

From: Preeti U Murthy
Date: Tue Nov 11 2014 - 06:00:32 EST


On 11/10/2014 08:47 PM, Peter Zijlstra wrote:
> On Mon, Nov 10, 2014 at 07:50:22PM +0530, Preeti U Murthy wrote:
>> Hi Peter,
>>
>> On 11/10/2014 05:59 PM, Peter Zijlstra wrote:
>>> On Fri, Nov 07, 2014 at 03:31:22PM +0100, Daniel Lezcano wrote:
>>>> The poll function is called when a timer expired or if we force to poll when
>>>> the cpu_idle_force_poll option is set.
>>>>
>>>> The poll function does:
>>>>
>>>> local_irq_enable();
>>>> while (!tif_need_resched())
>>>> cpu_relax();
>>>>
>>>> This default poll function suits for the x86 arch because its rep; nop;
>>>> hardware power optimization. But on other archs, this optimization does not
>>>> exists and we are not saving power. The arch specific bits may want to
>>>> optimize this loop by adding their own optimization.
>>>
>>> This doesn't make sense to me; should an arch not either implement an
>>> actual idle driver or implement cpu_relax() properly, why allow for a
>>> third weird option?
>>>
>>
>> The previous version of this patch simply invoked cpu_idle_loop() for
>> cases where latency_req was 0. This would have changed the behavior
>> on PowerPC wherein earlier the 0th idle index was returned which is also
>> a polling loop but differs from cpu_idle_loop() in two ways:
>>
>> a. It polls at a relatively lower power state than cpu_relax().
>> b. We set certain registers to indicate that the cpu is idle.
>
> So I'm confused; the current code runs the generic cpu_relax idle poll
> loop for the broadcast case. I suppose you want to retain this because
> not doing your a-b above will indeed give you a lower latency.
>
> Therefore one could argue that latency_req==0 should indeed use this,
> and your a-b idle state should be latency_req==1 or higher.
>
> Thus yes it changes behaviour, but I think it actually fixes something.
> You cannot have a latency_req==0 state which has higher latency than the
> actual polling loop, as you appear to have.

Yes you are right. This is fixing the current behavior. So we should be
good to call cpuidle_idle_loop() when latency_req=0.

>
>> Hence for all such cases wherein the cpu is required to poll while idle
>> (only for cases such as force poll, broadcast ipi to arrive soon and
>> latency_req = 0), we should be able to call into cpuidle_idle_loop()
>> only if the cpuidle driver's 0th idle state has an exit_latency > 0.
>> (The 0th idle state is expected to be a polling loop with
>> exit_latency = 0).
>>
>> If otherwise, it would mean the driver has an optimized polling loop
>> when idle. But instead of adding in the logic of checking the
>> exit_latency, we thought it would be simpler to call into an arch
>> defined polling idle loop under the above circumstances. If that is no
>> better we could fall back to cpuidle_idle_loop().
>
> That still doesn't make sense to me; suppose the implementation of this
> special poll state differs on different uarchs for the same arch, then
> we'll end up with another registration and selection interface parallel
> to cpuidle.

Yeah this will only get complicated. I was trying to see how to keep the
current behavior unchanged but looks like it is not worth it.

Thanks

Regards
Preeti U Murthy
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/