Re: [PATCH v1] platform/x86: intel_pmc_core: promote S0ix failure warn() to WARN()

From: Limonciello, Mario
Date: Mon Oct 31 2022 - 16:56:47 EST


On 10/31/2022 15:47, Rajat Jain wrote:
Hello,

On Mon, Oct 31, 2022 at 12:39 PM Limonciello, Mario
<Mario.Limonciello@xxxxxxx <mailto:Mario.Limonciello@xxxxxxx>> wrote:
>
> [Public]
>
> > -----Original Message-----
> > From: Sven van Ashbrook <svenva@xxxxxxxxxxxx <mailto:svenva@xxxxxxxxxxxx>>
> > Sent: Friday, October 28, 2022 23:12
> > To: Rajneesh Bhardwaj <irenic.rajneesh@xxxxxxxxx <mailto:irenic.rajneesh@xxxxxxxxx>>; Hans de Goede
> > <hdegoede@xxxxxxxxxx <mailto:hdegoede@xxxxxxxxxx>>
> > Cc: Limonciello, Mario <Mario.Limonciello@xxxxxxx <mailto:Mario.Limonciello@xxxxxxx>>; LKML <linux-
> > kernel@xxxxxxxxxxxxxxx <mailto:kernel@xxxxxxxxxxxxxxx>>; S-k, Shyam-sundar <Shyam-sundar.S-
> > k@xxxxxxx <mailto:k@xxxxxxx>>; rrangel@xxxxxxxxxxxx <mailto:rrangel@xxxxxxxxxxxx>; platform-driver-
> > x86@xxxxxxxxxxxxxxx <mailto:x86@xxxxxxxxxxxxxxx>; Rajneesh Bhardwaj <rajneesh.bhardwaj@xxxxxxxxx <mailto:rajneesh.bhardwaj@xxxxxxxxx>>;
> > Rafael J Wysocki <rjw@xxxxxxxxxxxxx <mailto:rjw@xxxxxxxxxxxxx>>; Rajat Jain <rajatja@xxxxxxxxxx <mailto:rajatja@xxxxxxxxxx>>;
> > David E Box <david.e.box@xxxxxxxxx <mailto:david.e.box@xxxxxxxxx>>; Mark Gross <markgross@xxxxxxxxxx <mailto:markgross@xxxxxxxxxx>>
> > Subject: Re: [PATCH v1] platform/x86: intel_pmc_core: promote S0ix failure
> > warn() to WARN()
> >
> > On Thu, Oct 27, 2022 at 12:02 PM Rajneesh Bhardwaj
> > <irenic.rajneesh@xxxxxxxxx <mailto:irenic.rajneesh@xxxxxxxxx>> wrote:
> > > I'd advise against this promotion based on my experience with S0ix entry
> > failures.
> >
> > On Thu, Oct 27, 2022 at 11:40 AM Hans de Goede <hdegoede@xxxxxxxxxx <mailto:hdegoede@xxxxxxxxxx>>
> > wrote:
> > > I'm not a fan of the change you are suggesting here.
> >
> > Thanks everyone for the feedback. Looks like there is consensus that it's
> > not advisable to promote the warning. We will move forward with changes to
> > our monitoring infrastructure instead.
>
> Did you see the idea proposed by David Box to introduce some infrastructure in
> the kernel for this?
>
> Just thinking about it a little bit more, it could be a lot nicer to have something like:
>
> /sys/power/suspend_stats/last_hw_deepest_state
>
> During the resume process drivers such as amd_pmc and intel_pmc_core could
> read the appropriate values for the hardware and call a function that would
> populate it with either a "0" or "1" or maybe even the amount of time spent in
> that state.
>
> We could then retire the debugging messages from both drivers

I do not think we should retire the debug messages. The sysfs
attribute could help us *trigger* a failure detection, but we would
still need these debug logs to actually determine why exactly we did
not go into the S0ix / deepest power state (And the debug messages
print out the debug register bit fields that let us know that).

Thanks,


I just spun together an RFC series for this idea and while doing it I had the same realization. So I left the warning messages in place for both drivers.

You can take a look at the series here:

https://lore.kernel.org/platform-driver-x86/20221031204320.22464-1-mario.limonciello@xxxxxxx/T/#m6c7db55c98b8a3ce8c48d451fc01c1d9b0df37fb

Thanks,