Re: [PATCHv1 7/8] arm64: dts: qcom: msm8998: Add PSCI cpuidle low power states

From: Amit Kucheria
Date: Fri May 10 2019 - 10:14:20 EST


On Fri, May 10, 2019 at 6:45 PM Marc Gonzalez <marc.w.gonzalez@xxxxxxx> wrote:
>
> On 10/05/2019 13:29, Amit Kucheria wrote:
>
> > Add device bindings for cpuidle states for cpu devices.
> >
> > Cc: Marc Gonzalez <marc.w.gonzalez@xxxxxxx>
> > Signed-off-by: Amit Kucheria <amit.kucheria@xxxxxxxxxx>
> > ---
> > arch/arm64/boot/dts/qcom/msm8998.dtsi | 32 +++++++++++++++++++++++++++
> > 1 file changed, 32 insertions(+)
> >
> > diff --git a/arch/arm64/boot/dts/qcom/msm8998.dtsi b/arch/arm64/boot/dts/qcom/msm8998.dtsi
> > index 3fd0769fe648..208281f318e2 100644
> > --- a/arch/arm64/boot/dts/qcom/msm8998.dtsi
> > +++ b/arch/arm64/boot/dts/qcom/msm8998.dtsi
> > @@ -78,6 +78,7 @@
> > compatible = "arm,armv8";
> > reg = <0x0 0x0>;
> > enable-method = "psci";
> > + cpu-idle-states = <&LITTLE_CPU_PD>;
>
> For some reason, I was expecting the big cores to come first, but according
> to /proc/cpuinfo, cores 0-3 are part 0x801, while cores 4-7 are part 0x800.
>
> According to https://github.com/pytorch/cpuinfo/blob/master/src/arm/uarch.c
>
> 0x801 = Low-power Kryo 260 / 280 "Silver" -> Cortex-A53
> 0x800 = High-performance Kryo 260 (r10p2) / Kryo 280 (r10p1) "Gold" -> Cortex-A73

Hmm, did I mess up the order of the big and LITTLE cores? I'll take a
look again.

> > efficiency = <1024>;
> > next-level-cache = <&L2_0>;
> > L2_0: l2-cache {
> > @@ -97,6 +98,7 @@
> > compatible = "arm,armv8";
> > reg = <0x0 0x1>;
> > enable-method = "psci";
> > + cpu-idle-states = <&LITTLE_CPU_PD>;
> > efficiency = <1024>;
> > next-level-cache = <&L2_0>;
> > L1_I_1: l1-icache {
> > @@ -112,6 +114,7 @@
> > compatible = "arm,armv8";
> > reg = <0x0 0x2>;
> > enable-method = "psci";
> > + cpu-idle-states = <&LITTLE_CPU_PD>;
> > efficiency = <1024>;
> > next-level-cache = <&L2_0>;
> > L1_I_2: l1-icache {
> > @@ -127,6 +130,7 @@
> > compatible = "arm,armv8";
> > reg = <0x0 0x3>;
> > enable-method = "psci";
> > + cpu-idle-states = <&LITTLE_CPU_PD>;
> > efficiency = <1024>;
> > next-level-cache = <&L2_0>;
> > L1_I_3: l1-icache {
> > @@ -142,6 +146,7 @@
> > compatible = "arm,armv8";
> > reg = <0x0 0x100>;
> > enable-method = "psci";
> > + cpu-idle-states = <&BIG_CPU_PD>;
> > efficiency = <1536>;
> > next-level-cache = <&L2_1>;
> > L2_1: l2-cache {
> > @@ -161,6 +166,7 @@
> > compatible = "arm,armv8";
> > reg = <0x0 0x101>;
> > enable-method = "psci";
> > + cpu-idle-states = <&BIG_CPU_PD>;
> > efficiency = <1536>;
> > next-level-cache = <&L2_1>;
> > L1_I_101: l1-icache {
> > @@ -176,6 +182,7 @@
> > compatible = "arm,armv8";
> > reg = <0x0 0x102>;
> > enable-method = "psci";
> > + cpu-idle-states = <&BIG_CPU_PD>;
> > efficiency = <1536>;
> > next-level-cache = <&L2_1>;
> > L1_I_102: l1-icache {
> > @@ -191,6 +198,7 @@
> > compatible = "arm,armv8";
> > reg = <0x0 0x103>;
> > enable-method = "psci";
> > + cpu-idle-states = <&BIG_CPU_PD>;
> > efficiency = <1536>;
> > next-level-cache = <&L2_1>;
> > L1_I_103: l1-icache {
> > @@ -238,6 +246,30 @@
> > };
> > };
> > };
> > +
> > + idle-states {
> > + entry-method="psci";
> > +
> > + LITTLE_CPU_PD: little-power-down {
> > + compatible = "arm,idle-state";
> > + idle-state-name = "little-power-down";
> > + arm,psci-suspend-param = <0x00000002>;
> > + entry-latency-us = <43>;
> > + exit-latency-us = <43>;
>
> Little cores have higher latency (+5%) than big cores?
>
> > + min-residency-us = <200>;
> > + local-timer-stop;
> > + };
> > +
> > + BIG_CPU_PD: big-power-down {
> > + compatible = "arm,idle-state";
> > + idle-state-name = "big-power-down";
> > + arm,psci-suspend-param = <0x00000002>;
> > + entry-latency-us = <41>;
> > + exit-latency-us = <41>;
> > + min-residency-us = <200>;
> > + local-timer-stop;
> > + };
> > + };
>
> What is the simplest way to test this patch?

You should be able to see state transitions in /sys/devices/cpu/cpu?/cpuidle/*/*

$ grep "" /sys/devices/cpu/cpu?/cpuidle/*/*

And if you have an instrumented board with power rails exposed, you
could measure the cpu rails with and without some load on the CPUs.
That'd help us tune the values too, in the future.

Regards,
Amit