Re: [PATCH] ACPI: PHAT: Add Platform Health Assessment Table support

From: Limonciello, Mario
Date: Mon Aug 21 2023 - 22:54:17 EST




On 8/21/2023 6:33 PM, Yazen Ghannam wrote:
On 8/21/23 3:23 PM, Limonciello, Mario wrote:


On 8/21/2023 2:16 PM, Rafael J. Wysocki wrote:
<snip>
Is there a preferred set of tools that can be updated?

I think you need to talk to distro people about this.

If not, would it make sense to develop a set of common kernel tools for
this?

Yes, it would, but please see above in the first place.

In my experience, it seems many folks use tools from their vendors or
custom tools.

This observation matches my own experience.

For the sake of discussion, and from a kernel developer's point of view,
should the tools be part of a separate project? Or should the tools be
part of the kernel tree like perf, etc.? Assuming that this needs to
start from scratch and not extending an existing project.

It can be both in principle, but from the practical standpoint it is
more likely to get all of the people to use the same set of tools if
they are included into the kernel source tree.

Yazen,

You generally envision tools like this to only be used when there is a problem, and not something that's run critical path on every boot right?


Hi Mario,

Generally, I think yes. But you summarized one issue earlier, and that
is the case where a user doesn't explicitly fetch the information and it
gets lost. This can be especially painful if the issue is difficult to
reproduce or has a long time to failure. Of course, this is new and
supplemental info, but every clue helps during debug.

Some highlights from the ACPI spec...

The PHAT is not urgent nor actionable by the OS:
"It is not expected that the OSPM would act on the data being exposed."

The info may be useful on each boot regardless of any problems:
"The Reset Reason Health Record defines a mechanism to describe the
cause of the last system reset or boot. The record will be created as a
Health Record in the PHAT table. This provides a standard way for system
firmware to inform the operating system of the cause of the last reset.
This includes both expected and unexpected events to support insights
across a fleet of systems by way of collecting the reset reason records
on each boot."

Note that it says "last reset", so it doesn't seem intended to keep a
running log to be fetched later.

If so, how about doing it in a high level language with easily importable libraries like Python?


This sounds good to me. Anything that can handle binary files.

Then the tools can still be stored "in kernel tree" and distributed with distro "kernel tools" packages but you can more easily use them on random old kernels too since the binary via /sys/firmware/acpi/tables should be widely available.

Yes, I agree. And I think we should give examples for running the tools
as services at boot. And documentation is needed, of course.

I don't exactly follow your last statement. Do you mean that new ACPI
tables will be exposed in sysfs even without explicit kernel updates?

Yeah that's what I was meaning. For example look at other tables the kernel doesn't parse like SLIC or MSDM. These don't have any changes to show up there.