Re: [PATCH 0/3] Allow specifying an S2RAM sleep on pre-SYSTEM_SUSPEND PSCI impls

From: Konrad Dybcio
Date: Thu Dec 19 2024 - 14:21:08 EST


On 13.11.2024 9:05 AM, Manivannan Sadhasivam wrote:
> On Tue, Nov 12, 2024 at 08:04:34PM +0100, Konrad Dybcio wrote:
>>
>>
>> On 11/12/24 19:43, Manivannan Sadhasivam wrote:
>>> On Tue, Nov 12, 2024 at 07:32:36PM +0100, Konrad Dybcio wrote:
>>>>
>>>>
>>>> On 11/12/24 19:01, Manivannan Sadhasivam wrote:
>>>>> On Mon, Oct 28, 2024 at 03:22:56PM +0100, Konrad Dybcio wrote:
>>>>>> Certain firmwares expose exactly what PSCI_SYSTEM_SUSPEND does through
>>>>>> CPU_SUSPEND instead. Inform Linux about that.
>>>>>> Please see the commit messages for a more detailed explanation.
>>>>>>
>>>>>
>>>>> It is still not PSCI_SYSTEM_SUSPEND though...
>>>>
>>>> It *literally* does the same thing on devices where it's exposed.
>>>>
>>>
>>> But still...
>>
>> Still-what? We can't replace the signed firmware on (unironically) tens
>> of millions of devices in the wild and this is how it exposes that sleep
>> state. This is how arm platforms did it before the PSCI spec was
>> updated and SYSTEM_SUSPEND is *still optional today*.
>>
>
> I never asked you to replace the firmware in first place, so don't quote the
> fact I never said.

Never implied you did. I'm putting pressure on the fact that we can't
update the firmware on such devices to expose PSCI_SYSTEM_SUSPEND.

> I see this approach as a way of abusing/faking PSCI system
> suspend.

And I disagree. I can't stress this enough, calling PSCI_SYSTEM_SUSPEND
is literally internally equivalent to calling PSCI_CPU_SUSPEND(magicval).

>
> Moreover, I heard from Bjorn that Qcom doesn't want to put the PCIe devices into
> D3Cold during system suspend for future platforms (based on their
> experimentation). So if drivers rely on this static information, then even Qcom
> cannot achieve what they want.
>
>>
>>>>>> This is effectively a more educated follow-up to [1].
>>>>>>
>>>>>> The ultimate goal is to stop making Linux think that certain states
>>>>>> only concern cores/clusters, and consequently setting
>>>>>> pm_set_suspend/resume_via_firmware(), so that client drivers (such as
>>>>>> NVMe, see related discussion over at [2]) can make informed decisions
>>>>>> about assuming the power state of the device they govern.
>>>>>>
>>>>>> If this series gets green light, I'll push a follow-up one that wires
>>>>>> up said sleep state on Qualcomm SoCs across the board.
>>>>>>
>>>>>
>>>>> Sorry. I don't think PSCI is the right place for this. Qcom SoCs have a common
>>>>> firmware across all segments (mostly),
>>>>
>>>> This ^
>>>>
>>>>> so there is no S2R involved and only S2Idle.
>>>>
>>>> is not at all related to this ^, the "so" makes no sense.
>>>>
>>>> (also you're wrong, this *is* S2RAM)
>>>>
>>>
>>> What? Qcom SoCs supporting S2R? I'm unheard of.
>>
>> Maybe you're thinking of hibernation, which is not widely (if at all)
>> supported.
>>
>
> Not hibernation. The Qcom platforms I've aware of all support only S2Idle. I
> don't work for Qcom, so I may be missing some insider information.

I think this is the main source of misunderstanding in this entire thread.

CXPC is S2RAM. Not S2idle.

Shallower sleep states on QC platforms are S2idle.

>>>>> If you use PSCI to implement suspend_via_firmware(), then all the SoCs
>>>>> making use of the PSCI implementation will have the same behavior. I don't think
>>>>> we would want that.
>>>>
>>>> This is an issue with the NVMe framework that is totally unrelated to this
>>>> change, see below. Also, the code only sets that on targets where such state
>>>> exists and is described.
>>>>
>>>
>>> Well, you are doing it just because you want the NVMe device to learn about the
>>> platform requirement.
>>
>> And I can't see why you're having a problem with this. It's exactly how it
>> works on x86 too. Modern Standby also shuts down storage on Windows,
>> regardless of the CPU architecture.
>
> It is not just my problem. I'm expressing the concern that NVMe folks have and
> already expressed over the similar solutions I proposed. And I cannot just
> overrule them.

Sure, but if PSCI_SYSTEM_SUSPEND implies S2ram, why should the behavior be
different purely based on the architectural idle implementation?

Moreover, if the same platform can be booted with ACPI or DT, why should
power state switching work differently, considering both would describe
the hardware accurately?

>>>>> For instance, if a Qcom SoC is used in an android tablet with the same firmware,
>>>>> then this would allow the NVMe device to be turned off during system suspend all
>>>>> the time when user presses the lock button. And this will cause NVMe device to
>>>>> wear out faster. The said approach will work fine for non-android usecases
>>>>> though.
>>>>
>>>> The NVMe framework doesn't make a distinction between "phone screen off" and
>>>> "laptop lid closed & thrown in a bag" on *any* platform. The usecase you're
>>>> describing is not supported as of today since nobody *actually* has NVMe on a
>>>> phone that also happens to run upstream Linux.
>>>> I'm not going to solve imaginary problems.
>>>>
>>>
>>> Not just phone, NVMe device could be running on an android tablet.
>>
>> 'Could' very much makes it imaginary. There are no supported devices that
>> fall into this category.
>>
>
> Agree that there are no products in the market (yet). But having NMVe on
> handheld devices is not something I would quote as 'imaginary'.
>
>>> I'm not
>>> talking about an imaginary problem, but a real problem that is in a forseeable
>>> future
>>
>> Keyword: future. This issue has been on hold for years because of 'issues'
>> that are pinky promised to happen eventually, without anyone suggesting any
>> actually acceptable solutions. This just undermines progress.
>>
>
> Not true. There are solutions suggested, but then it always takes time to reach
> consensus. One of the approach that I'm about to propose is to have a userspace
> knob that specifies whether the device can be powered down or not (leaving the
> default behavior to put them in low power state). Because, the decision to put
> the devices into power down or low power state sounds more like an userspace
> policy. It was discussed at LPC 2023.

Sure, however I believe it is perfectly reasonable to change the
default setting there based on platform capabilities.

Konrad

>
>>> (that is also the reason why NVMe developers doesn't want to put the
>>> device into power down mode always during system suspend).
>>
>> This is the current behavior on any new x86 laptop, and has been for a
>> couple of years.
>>
>>> And with this change, you are just going to make the NVMe lifetime miserable on
>>> those platforms.
>>
>> Fearmongering and hearsay. See above.
>>
>
> I can only wish you best of luck with this approach!
>
> - Mani
>