Re: [PATCH v1] platform/x86: intel_pmc_core: promote S0ix failure warn() to WARN()

From: Hans de Goede
Date: Mon Oct 31 2022 - 16:16:47 EST


Hi,

On 10/31/22 20:39, Limonciello, Mario wrote:
> [Public]
>
>> -----Original Message-----
>> From: Sven van Ashbrook <svenva@xxxxxxxxxxxx>
>> Sent: Friday, October 28, 2022 23:12
>> To: Rajneesh Bhardwaj <irenic.rajneesh@xxxxxxxxx>; Hans de Goede
>> <hdegoede@xxxxxxxxxx>
>> Cc: Limonciello, Mario <Mario.Limonciello@xxxxxxx>; LKML <linux-
>> kernel@xxxxxxxxxxxxxxx>; S-k, Shyam-sundar <Shyam-sundar.S-
>> k@xxxxxxx>; rrangel@xxxxxxxxxxxx; platform-driver-
>> x86@xxxxxxxxxxxxxxx; Rajneesh Bhardwaj <rajneesh.bhardwaj@xxxxxxxxx>;
>> Rafael J Wysocki <rjw@xxxxxxxxxxxxx>; Rajat Jain <rajatja@xxxxxxxxxx>;
>> David E Box <david.e.box@xxxxxxxxx>; Mark Gross <markgross@xxxxxxxxxx>
>> Subject: Re: [PATCH v1] platform/x86: intel_pmc_core: promote S0ix failure
>> warn() to WARN()
>>
>> On Thu, Oct 27, 2022 at 12:02 PM Rajneesh Bhardwaj
>> <irenic.rajneesh@xxxxxxxxx> wrote:
>>> I'd advise against this promotion based on my experience with S0ix entry
>> failures.
>>
>> On Thu, Oct 27, 2022 at 11:40 AM Hans de Goede <hdegoede@xxxxxxxxxx>
>> wrote:
>>> I'm not a fan of the change you are suggesting here.
>>
>> Thanks everyone for the feedback. Looks like there is consensus that it's
>> not advisable to promote the warning. We will move forward with changes to
>> our monitoring infrastructure instead.
>
> Did you see the idea proposed by David Box to introduce some infrastructure in
> the kernel for this?
>
> Just thinking about it a little bit more, it could be a lot nicer to have something like:
>
> /sys/power/suspend_stats/last_hw_deepest_state
>
> During the resume process drivers such as amd_pmc and intel_pmc_core could
> read the appropriate values for the hardware and call a function that would
> populate it with either a "0" or "1" or maybe even the amount of time spent in
> that state.
>
> We could then retire the debugging messages from both drivers and instead of
> your infrastructure relying upon scanning logs, userspace could read that sysfs
> file after every suspend and raise the alarms when it's 0 (or if it's populated with
> time smaller than X% of the total suspend time).

Something like this does indeed sound like a nice solution for this.

Regards,

Hans