Re: s2idle breaks on machines without cpuidle support

From: Hector Martin
Date: Wed Feb 08 2023 - 11:45:56 EST


On 09/02/2023 01.18, Sudeep Holla wrote:
> On Thu, Feb 09, 2023 at 12:42:17AM +0900, Hector Martin wrote:
>> On 08/02/2023 19.35, Sudeep Holla wrote:
>>> On Wed, Feb 08, 2023 at 04:48:18AM +0900, Kazuki wrote:
>>>> On Mon, Feb 06, 2023 at 10:12:39AM +0000, Sudeep Holla wrote:
>>>>>
>>>>> What do you mean by break ? More details on the observation would be helpful.
>>>> For example, CLOCK_MONOTONIC doesn't stop even after suspend since
>>>> these chain of commands don't get called.
>>>>
>>>> call_cpuidle_s2idle->cpuidle_enter_s2idle->enter_s2idle_proper->tick_freeze->sched_clock_suspend (Function that pauses CLOCK_MONOTONIC)
>>>>
>>>> Which in turn causes programs like systemd to crash since it doesn't
>>>> expect this.
>>>
>>> Yes expected IIUC. The per-cpu timers and counters continue to tick in
>>> WFI and hence CLOCK_MONOTONIC can't stop.
>>
>> The hardware counters would also keep ticking in "real" s2idle (with
>> hypothetical PSCI idle support) and often in full suspend. There is a
>> flag for this (CLOCK_SOURCE_SUSPEND_NONSTOP) that is set by default for
>> ARM. So this isn't why CLOCK_MONOTONIC isn't stopping; there is
>> machinery to make the kernel's view of time stop anyway, it's just not
>> being invoked here.
>>
>
> Indeed, and I assumed s2idle was designed with those requirements but I
> think I may be wrong especially looking at few points you have raised
> provide my understanding is aligned with yours.
>
>> This is somewhat orthogonal to the issue of PSCI. These machines can't
>> physically support PSCI and KVM at the same time (they do not have EL3),
>> so PSCI is not an option. We will be starting a conversation about how
>> to provide something "like" PSCI over some other sort of transport to
>> solve this soon. So that will "fix" this issue once it's all implemented.
>>
>
> All the best for the efforts.
>
>> However, these machines aren't the only ones without PSCI (search for
>> "spin-table" in arch/arm64/boot/dts, this isn't new and these aren't the
>> first).
>
> Yes I am aware of it and if you see arch/arm64/kernel/smp_spin_table.c
> we don't support CPU hotplug or suspend for such a system.

We certainly support s2idle, except it's kind of broken as stated. Try
it, it works :-)

I didn't do anything special to enable s2idle on Apple platforms other
than make sure random drivers weren't broken and there was at least one
driver capable of triggering a wakeup. I just compile with
CONFIG_SUSPEND and s2idle works. Except for the part where
CLOCK_MONOTONIC keeps running. So generic kernels on spin_table
platforms ought to expose (broken) s2idle by default already.

>> It seems broken that Linux currently implements s2idle in such a
>> way that it violates the userspace clock behavior contract on systems
>> without a cpuidle driver (and does so silently, to make it worse).
>
> Just to check if I understand this correctly, are you referring to:
> cpuidle_idle_call()->default_idle_call() if cpuidle_not_available()
> And the problem is it idles there in wfi but CLOCK_MONOTONIC isn't
> stopping as expected by the userspace ? If so, fair enough. If not, I
> may be missing to understand something.

Right. I'm not too certain on the details of exactly what suspend
machinery is running/supposed to, because this CLOCK_MONOTONIC issue was
a surprise to me when it came up. From my point of view s2idle "just
worked", it's only now that this has come up that we're realizing it's
winding up in a very different codepath to what would happen were
cpuidle/PSCI available. This was all silent from the user POV (it all
looks like it suspends/resumes normally as far as I can tell).

>> So that should be fixed regardless of whether we end up coming up with a
>> PSCI alternative or not for these platforms.
>
> If above understanding is correct, I agree. But not sure what was the
> motivation behind the current behaviour.
>
>> There's no fundamental reason why s2idle can't work properly with plain WFI.
>>
>
> Fair enough. I hadn't thought much of it before as most of the platforms
> I have seen or used have at-least one deeper than WFI state these days.
> On arm32, this was common but each platform managed suspend_set_ops
> on its own and probably can do the same s2idle_set_ops.

Yeah, we do have one deeper idle state (and we should figure out how to
implement a PSCI alternative to enable it soon, since in particular for
certain SoCs plain WFI is quite a power hog since it keeps all the core
clusters powered up and at least partially clocked). But since we don't
have that yet, we've been using WFI-only s2idle so users have *some*
suspend ability.

- Hector