RE: [PATCH v1] platform/x86: intel_pmc_core: promote S0ix failure warn() to WARN()

From: Limonciello, Mario
Date: Mon Oct 31 2022 - 15:39:36 EST


[Public]

> -----Original Message-----
> From: Sven van Ashbrook <svenva@xxxxxxxxxxxx>
> Sent: Friday, October 28, 2022 23:12
> To: Rajneesh Bhardwaj <irenic.rajneesh@xxxxxxxxx>; Hans de Goede
> <hdegoede@xxxxxxxxxx>
> Cc: Limonciello, Mario <Mario.Limonciello@xxxxxxx>; LKML <linux-
> kernel@xxxxxxxxxxxxxxx>; S-k, Shyam-sundar <Shyam-sundar.S-
> k@xxxxxxx>; rrangel@xxxxxxxxxxxx; platform-driver-
> x86@xxxxxxxxxxxxxxx; Rajneesh Bhardwaj <rajneesh.bhardwaj@xxxxxxxxx>;
> Rafael J Wysocki <rjw@xxxxxxxxxxxxx>; Rajat Jain <rajatja@xxxxxxxxxx>;
> David E Box <david.e.box@xxxxxxxxx>; Mark Gross <markgross@xxxxxxxxxx>
> Subject: Re: [PATCH v1] platform/x86: intel_pmc_core: promote S0ix failure
> warn() to WARN()
>
> On Thu, Oct 27, 2022 at 12:02 PM Rajneesh Bhardwaj
> <irenic.rajneesh@xxxxxxxxx> wrote:
> > I'd advise against this promotion based on my experience with S0ix entry
> failures.
>
> On Thu, Oct 27, 2022 at 11:40 AM Hans de Goede <hdegoede@xxxxxxxxxx>
> wrote:
> > I'm not a fan of the change you are suggesting here.
>
> Thanks everyone for the feedback. Looks like there is consensus that it's
> not advisable to promote the warning. We will move forward with changes to
> our monitoring infrastructure instead.

Did you see the idea proposed by David Box to introduce some infrastructure in
the kernel for this?

Just thinking about it a little bit more, it could be a lot nicer to have something like:

/sys/power/suspend_stats/last_hw_deepest_state

During the resume process drivers such as amd_pmc and intel_pmc_core could
read the appropriate values for the hardware and call a function that would
populate it with either a "0" or "1" or maybe even the amount of time spent in
that state.

We could then retire the debugging messages from both drivers and instead of
your infrastructure relying upon scanning logs, userspace could read that sysfs
file after every suspend and raise the alarms when it's 0 (or if it's populated with
time smaller than X% of the total suspend time).