Re: [PATCH v1] platform/x86: intel_pmc_core: promote S0ix failure warn() to WARN()

From: Mario Limonciello
Date: Mon Oct 31 2022 - 21:58:48 EST


On 10/31/22 20:38, Sven van Ashbrook wrote:
On Mon, Oct 31, 2022 at 3:39 PM Limonciello, Mario
<Mario.Limonciello@xxxxxxx> wrote:

Just thinking about it a little bit more, it could be a lot nicer to have something like:

/sys/power/suspend_stats/last_hw_deepest_state

While I agree that reporting through a framework is generally better
than getting infrastructure to grep for specific strings, I believe
that a simple sysfs file is probably too simplistic.

1. We need more sophisticated reporting than just last_hw_deepest_state:

- sometimes the system enters the deep state we want, yet after a
while moves back up and gets "stuck" in an intermediate state (below
S0). Or, the system enters the deep state we want, but moves back to
S0 after a time without apparent reason. These platform-dependent
failures are not so easily describable in a generic framework.

I actually thought that by putting the duration of time put in last_hw_deepest_state you'll be able to catch this by comparing the duration of the suspend to the duration of last_hw_deepest_state.

If you're below some threshold of percent for suspends that are at least some other threshold long you can trigger the failure.

This then lets you tune your framework to find the right place for those
thresholds too without needing to change the kernel.


- ChromeOS in particular has multiple independent S0ix / S3 / s2idle
failure report sources. We have the kernel warning above; also our
Embedded Controller monitors suspend failure cases which the simple
kernel warning cannot catch, reported through a separate WARN_ONCE().
> 2. A simple sysfs file will need to be polled by the infrastructure
after every suspend; it would be preferable to have some signal or
callback which the infrastructure could register itself with.

The interface to trigger a suspend is writing a value into /sys/power/state. You'll get a return code from this, but this return code does not represent whether you got to the deepest state, just whether the suspend succeeded or not.

So what would an ideal interface that sends a signal that the last "successful" suspend didn't get to the deepest state look like to you?


The generic infrastructure to support this sounds like quite a bit of
work, and for what gain? Compared to simply matching a log string and
sending the whole dmesg if there's a match.

I would like to think it's cheaper to read the sysfs file, do a local comparison on HW deepest time to the suspend time and then only send the the dmesg up for further analysis.


Is the light worth the candle?

I wrote an RFC that I sent out for it with my ideas at least.