Re: [PATCH V2] ARM: cpuidle: Support asymmetric idle definition

From: Sudeep Holla
Date: Fri Jun 02 2017 - 09:45:55 EST




On 02/06/17 11:40, Daniel Lezcano wrote:
> On 02/06/2017 12:14, Sudeep Holla wrote:
>>
>>
>> On 02/06/17 11:06, Daniel Lezcano wrote:
>>> On 02/06/2017 11:39, Sudeep Holla wrote:
>>>>
>>>>
>>>> On 02/06/17 10:25, Daniel Lezcano wrote:
>>>>> On 02/06/2017 11:20, Sudeep Holla wrote:
>>>>>>
>>>>>>
>>>>>> On 01/06/17 12:39, Daniel Lezcano wrote:
>>>>>>> Some hardware have clusters with different idle states. The current code does
>>>>>>> not support this and fails as it expects all the idle states to be identical.
>>>>>>>
>>>>>>> Because of this, the Mediatek mtk8173 had to create the same idle state for a
>>>>>>> big.Little system and now the Hisilicon 960 is facing the same situation.
>>>>>>>
>>>>>>> Solve this by simply assuming the multiple driver will be needed for all the
>>>>>>> platforms using the ARM generic cpuidle driver which makes sense because of the
>>>>>>> different topologies we can support with a single kernel for ARM32 or ARM64.
>>>>>>>
>>>>>>> Every CPU has its own driver, so every single CPU can specify in the DT the
>>>>>>> idle states.
>>>>>>>
>>>>>>> This simple approach allows to support the future dynamIQ system, current SMP
>>>>>>> and HMP.
>>>>>>>
>>>>>>> It is unoptimal from a memory point of view for a system with a large number of
>>>>>>> CPUs but nowadays there is no such system with a cpuidle driver on ARM.
>>>>>>>
>>>>>>
>>>>>> While I agree this may be simple solution, but just not necessary for
>>>>>> systems with symmetric idle states especially one with large number of
>>>>>> CPUs. I don't like to see 96 CPU Idle driver on say ThunderX. So we
>>>>>> *must* have some basic distinction done here.
>>>>>>
>>>>>> IMO, we can't punish a large SMP systems just because they don't have
>>>>>> asymmetric idle states.
>>>>>
>>>>> Can you point me in the upstream kernel a DTS with 96 cpus and using the
>>>>> cpuidle-arm driver ?
>>>>>
>>>>
>>>> The bindings are upstream right. Not all DTS are upstream, firmware
>>>> generate them especially for large systems.
>>>>
>>>> Check arch/arm64/boot/dts/cavium/thunder{,2}-{88,99}xx.dtsi, it has
>>>> supports PSCI and firmware can update DTB to add the idle states.
>>>> They are systems with 96 and 128 CPUs.
>>>
>>> Ok, so I have to assume there are out of tree DTB with idle state
>>> definitions. In other circumstances I would have just ignored this
>>> argument but I can admit the DTB blob thing is in the grey area between
>>> what we have to support upstream and out of tree changes.
>>>
>>
>> Not entirely true. It's clear, we support anything whose binding is
>> upstream. I do agree that there are lots of out of tree bindings but I
>> am not referring to them. I am just referring to out of tree DTS with
>> already upstreamed binding.
>
> That is what I meant :)
>
>>> However, even if it is suboptimal, I'm convinced doing a per-cpu driver
>>> makes more sense.
>>>
>>
>> OK, but I think a simple check to decide not to, on SMP system is not
>> too much a ask ?
>
> Yes, it is :)
>

If you are objecting out of the tree DTS whose bindings are upstream,
then that's simply wrong.
1. Firmware can generate/update DTB
2. There's or was a plan to move DTS out of the tree.
In short, bindings is all what matters.

> We end up with a hack by parsing and analyzing the DT based on the idea
> different idle states means different cluster where you stated right
> before in the topology comment that is not necessary true. So Lorenzo's
> patch adds heuristic which may be wrong.
>

Completely disagree. When it may look hacky, but it's not heuristics.
The CPU idle bindings doesn't and need not link with CPU topology.
I re-iterate that "different idle states doesn't means different
cluster". It just means they are probably in different power domains.
Power domains need not share topological boundaries.

> If we decide to consider all cpus with their own set of idle states,
> everything is solved with a cpuidle driver per cpu.
>
> The only drawback is the memory consumption.
>

Yes, I get that. I just feel that we need not do that for systems that
don't need it.

> But on the other side, as the idle state are per cpu, we don't trash the
> cache.
>
> Moreover, we already had that, an array of idle states per cpu (see
> commit 46bcfad7), it was in the struct cpuidle_device instead of the
> struct cpuidle_driver. It has been moved away in the latter with the
> assumption the SMP have all the same idle states but now we see it is no
> longer true.
>
> Creating a cpuidle_driver per cpu is like using the previous approach.
> The only thing is we must improve to save memory, that's all (eg. a
> pointer allocating the idle states rather than hardcoding it).
>
> We can live with a driver per cpu and do the memory optimization.
>
> If we have an observed regression with large SMP systems, we fix it.
>

OK, I am fine with that if you agree that we can fix it later.

--
Regards,
Sudeep