Re: [RFC PATCH] arm64: dts: allwinner: a64/h5: Add CPU idle states

From: Maxime Ripard
Date: Tue Mar 30 2021 - 04:51:35 EST


Hi Samuel,

On Tue, Mar 23, 2021 at 11:44:50PM -0500, Samuel Holland wrote:
> On 3/22/21 8:56 PM, Andre Przywara wrote:
> >> I'm sending this patch as an RFC because it raises questions about how
> >> we handle firmware versioning. How far back does (or should) our support
> >> for old TF-A and Crust versions go?
> >>
> >> cpuidle has a problem that without working firmware support, CPUs will
> >> enter idle states and be unable to wake up. As a result, the system will
> >> hang at some point during boot, usually before getting to userspace.
> >>
> >> For over a year[0], TF-A has exposed the PSCI CPU_SUSPEND function when
> >> a SCPI implementation is present[1]. Implementing CPU_SUSPEND is
> >> required for implementing SYSTEM_SUSPEND[2], even if CPU_SUSPEND is not
> >> itself used for anything.
> >>
> >> However, there was no code to actually wake up a CPU once it called the
> >> CPU_SUSPEND function, because I could not find the register providing
> >> the necessary information. The fact that CPU_SUSPEND was broken affected
> >> nobody, because nothing ever called it -- there were no idle states in
> >> the DTS. In hindsight, what I should have done was always return failure
> >> from sunxi_validate_power_state(), but that ship has long sailed.
> >>
> >> I finally found the elusive register and implemented the wakeup code
> >> earlier this month[3]. So now, CPU_SUSPEND actually works, if all of
> >> your firmware is up to date, and cpuidle works if you add the states in
> >> your device tree.
> >>
> >> Unfortunately, there is currently nothing verifying that compatibility.
> >> So you can get into four possible scenarios:
> >> 1) No idle states in DTS, any firmware => Linux works, with baseline
> >> power consumption.
> >> 2) Idle states added to DTS, no Crust/SCPI => Linux works, but every
> >> attempt to enter an idle state is rejected because CPU_SUSPEND is
> >> not hooked up. So power consumption increases by a sizable amount.
> >> 3) Idle states added to DTS, "old" Crust/SCPI (before [3]) => Linux
> >> fails to boot, because CPUs never return from idle states.
> >> 4) Idle states added to DTS, "new" Crust/SCPI (after [3]) => Linux
> >> works, with improved power consumption compared to the baseline.
> >>
> >> Obviously, we want to prevent scenario 3 if possible.
> >
> > So I think the core of the problem is that the DT describes some
> > firmware feature, but we have the DT bundled with the kernel, not the
> > firmware.
>
> I would say the core problem is that the firmware lies about supporting
> PSCI CPU_SUSPEND. Linux shouldn't be calling CPU_SUSPEND if the firmware
> declares it as unavailable, regardless of what is in the DTS.

I would say we have two core problems :)

> (Technically, per the PSCI standard, CPU_SUSPEND is a mandatory
> function, but a quick survey of the TF-A platforms shows it is far from
> universally implemented.)

U-boot also implements CPU_SUSPEND but will return -1 if you don't have
an implementation. I guess that at the moment we shouldn't be reporting
it as supported if we don't

But I also agree with Andre here, we shouldn't list cpuidles
capabilities in the DTS if we don't always have them either.

> > So is there any way we can detect an older crust version in U-Boot,
> > then remove any potential idle states from the DT?
>
> Let's assume that we are using a functioning SoC (H3) or the secure fuse
> is blown (A64) and therefore U-Boot cannot access SRAM A2. I can think
> of three ways it can learn about crust:
>
> a) PSCI_FEATURES (e.g. is CPU_SUSPEND supported)
> b) Metadata in the FIT image
> c) Custom SMCs
>
> TF-A has some additional methods available:
>
> d) The SCPI-reported firmware version
> e) The magic number at the beginning of the firmware binary
>
> > Granted, this requires recent U-Boot as well, but at least we could try
> > to mitigate the worst case a bit?
>
> If we're okay with modifying firmware to solve this problem, then I
> propose the following solution:
>
> 1) Version bump crust or change its magic number.
> 2) Modify TF-A to only report CPU_SUSPEND as available if it detects the
> new crust version. This would involve conditionally setting
> sunxi_scpi_psci_ops.validate_power_state, and updating psci_setup.c
> to also check for .validate_power_state when setting psci_caps.
> 3) Modify the Linux PSCI client to respect PSCI_FEATURES when setting
> psci_ops.cpu_suspend. cpuidle-psci checks for this function before
> setting up idle states.
> 4) Finally, after some time, add the idle states to the DTS.
>
> In fact, this solution solves both scenarios 2 and 3, because it also
> takes care of the native PM implementation, which doesn't implement
> CPU_SUSPEND at all.
>
> Does that sound workable?

If we can add the DT node at boot in crust (or in TF-A), then we don't
really need PSCI_FEATURES, and it would magically work once the firmware
is updated. It looks like a solid plan otherwise

Maxime

Attachment: signature.asc
Description: PGP signature