Re: [PATCH] arm64: suspend: Remove forcing error from suspend finisher

From: Sudeep Holla

Date: Fri Mar 20 2026 - 07:11:02 EST

On Fri, Mar 20, 2026 at 11:13:08AM +0100, Lorenzo Pieralisi wrote:
> On Thu, Mar 19, 2026 at 04:52:18PM +0000, Sudeep Holla wrote:
> > On Thu, Mar 19, 2026 at 03:14:01PM +0000, Sudeep Holla wrote:
> > > On Mon, Mar 16, 2026 at 02:18:18PM +0530, Maulik Shah wrote:
> > > > Successful cpu_suspend() may not always want to return to cpu_resume() to
> > > > save the work and latency involved.
> > > >
> > > > consider a scenario,
> > > >
> > > > when single physical CPU (pCPU) is used on different virtual machines (VMs)
> > > > as virtual CPUs (vCPUs). VM-x's vCPU can request a powerdown state after
> > > > saving the context by invoking __cpu_suspend_enter() whereas VM-y's vCPU is
> > > > requesting a shallower than powerdown state. The hypervisor aggregates to a
> > > > non powerdown state for pCPU. A wakeup event for VM-x's vCPU may want to
> > > > resume the execution at the same place instead of jumping to cpu_resume()
> > > > as the HW never reached till powerdown state which would have lost the
> > > > context.
> > > >
> > >
> > > Though I don't fully understand the intention/use-case for presenting the
> > > VMs with powerdown states ....
> > >
> > > > While the vCPU of VM-x had latency impact of saving the context in suspend
> > > > entry path but having the return to same place saves the latency to restore
> > > > the context in resume path.
> > > >
> > >
> > > I understand the exit-latency aspect, though the register set involved is not
> > > very large unless the driver notifier list is sizable on some platforms. This
> > > is typically the case in Platform Coordinated mode.
> > >
> > > > consider another scenario,
> > > >
> > > > Newer CPUs include a feature called “powerdown abandon”. The feature is
> > > > based on the observation that events like GIC wakeups have a high
> > > > likelihood of happening while the CPU is in the middle of its powerdown
> > > > sequence (at wfi). Older CPUs will powerdown and immediately power back
> > > > up when this happens. The newer CPUs will “give up” mid way through if
> > > > no context has been lost yet. This is possible as the powerdown operation
> > > > is lengthy and a large part of it does not lose context [1].
> > > >
> > >
> > > When you say "large part" above, do you mean that none of the CPU context, as
> > > visible to software, is lost? Otherwise, we would need to discuss that "large
> > > part" in more detail. From the kernel point of view, this is a simple boolean:
> > > context is either lost or retained. Anything in between is not valid, as we do
> > > not support partial context loss.
> > >
> > > > As the wakeup arrived after SW powerdown is done but before HW is fully
> > > > powered down. From SW view this is still a successful entry to suspend
> > > > and since the HW did not loose the context there is no reason to return at
> > > > entry address cpu_resume() to restore the context.
> > > >
> > >
> > > Yes, that may be worth considering from an optimization perspective. However,
> > > if the hardware aborts the transition, then returning success regardless of the
> > > software state should still be counted as a failure. That would keep the
> > > cpuidle entry statistics more accurate than returning success. And it is
> > > a failure as the OS expected to enter that powerdown state but there was
> > > as H/W abort.
> > >
> > > > Remove forcing the failure at kernel if the execution does not resume at
> > > > cpu_resume() as kernel has no reason to treat such returns as failures
> > > > when the firmware has already filled in return as success.
> > > >
> > >
> > > This is not possible with the current PSCI spec:
> > > "Powerdown states do not return on success because restart is through the
> > > entry point address at wakeup."
> > >
> >
> > OK, my bad. Sorry for that.
> > For some reason, I read "do not return" as "must not return".
> >
> > The spec allows this:
> > | The caller must not assume that a powerdown request will return using the
> > | specified entry point address. The powerdown request might not complete due,
> > | for example, to pending interrupts. It is also possible that, because of
> > | coordination with other cores, the actual state entered is shallower
> > | than the one requested. Because of this it is possible for an
> > | implementation to downgrade the powerdown state request to a standby
> > | state. In the case of a downgrade to standby, the implementation
> > | returns at the instruction following the PSCI call, at the Exception
> > | level of the caller, instead of returning by the specified entry point
> > | address. The return code in this case is SUCCESS. In the case of an
> > | early return due to a pending wakeup event, the implementation can
> > | return at the next instruction, with a return code of SUCCESS, or
> > | resume at the specified entry point address
> >
> > So we need to dig and check if there was any reason for returning "NOT
> > SUPPORTED" when the call returned success.
>
> Because we have no clue whatsoever about what happened. We need to get
> back to the cpu_suspend() caller and either say "we entered state X instead
> of Y" or report a failure (because an interrupted power down sequence *is* a
> failure for Linux - unless we want to make things up), we just can't know so
> to me the code seems good as it is (we can debate about the error code, yes
> but the gist does not change).
>

Completely agreed. The current code clearly survives a successful return, so,
in my opinion, nothing is broken. It is really just a matter of exploring
whether there is a better way to express this error condition. I doubt that is
possible, since it is either success or error; it cannot be both.

Just to clarify, my earlier comment was purely about the error code, not about
success versus error.

> Is that we want to tell CPUidle that entering the state was successful even if
> the power down sequence was interrupted or the state demoted ?

That sounds like the ask to me, but that would be incorrect.

> That's tantamount to lying IMO and would skew the power stats, no ?
>

Yes, I agree.

> Let me know, I am just trying to understand this patch's goal.
>

I will let Maulik present his opinion, as I am aligned with you, Lorenzo.

--
Regards,
Sudeep