Re: [PATCH v2] xen/acpi: upload power and performance related data from a PVH dom0

From: Jason Andryuk
Date: Tue Dec 03 2024 - 17:39:42 EST


On 2024-09-25 05:17, Jürgen Groß wrote:
On 16.09.24 22:50, Jason Andryuk wrote:
From: Roger Pau Monne <roger.pau@xxxxxxxxxx>

When running as a PVH dom0 the ACPI MADT is crafted by Xen in order to
report the correct numbers of vCPUs that dom0 has, so the host MADT is
not provided to dom0.  This creates issues when parsing the power and
performance related data from ACPI dynamic tables, as the ACPI
Processor UIDs found on the dynamic code are likely to not match the
ones crafted by Xen in the dom0 MADT.

Xen would rely on Linux having filled at least the power and
performance related data of the vCPUs on the system, and would clone
that information in order to setup the remaining pCPUs on the system
if dom0 vCPUs < pCPUs.  However when running as PVH dom0 it's likely
that none of dom0 CPUs will have the power and performance data
filled, and hence the Xen ACPI Processor driver needs to fetch that
information by itself.

In order to do so correctly, introduce a new helper to fetch the _CST
data without taking into account the system capabilities from the
CPUID output, as the capabilities reported to dom0 in CPUID might be
different from the ones on the host.

Note that the newly introduced code will only fetch the _CST, _PSS,
_PPC and _PCT from a single CPU, and clone that information for all the
other Processors.  This won't work on an heterogeneous system with
Processors having different power and performance related data between
them.

Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
Signed-off-by: Jason Andryuk <jason.andryuk@xxxxxxx>
---
v2:
Add second buffer for _CST.  Call was failing with
AE_BUFFER_OVERFLOW(0x000b)

Running a PVH Dom0 on AMD, I needed this v2 change to get the C-State
information uploaded.

---
  drivers/xen/pcpu.c               |   3 +-
  drivers/xen/xen-acpi-processor.c | 230 ++++++++++++++++++++++++++++---
  include/xen/xen.h                |   2 +-
  3 files changed, 216 insertions(+), 19 deletions(-)

diff --git a/drivers/xen/pcpu.c b/drivers/xen/pcpu.c
index c63f317e3df3..dc9f2c14bf62 100644
--- a/drivers/xen/pcpu.c
+++ b/drivers/xen/pcpu.c

...

@@ -354,24 +511,44 @@ read_acpi_id(acpi_handle handle, u32 lvl, void *context, void **rv)
      default:
          return AE_OK;
      }
-    if (invalid_phys_cpuid(acpi_get_phys_id(handle,
-                        acpi_type == ACPI_TYPE_DEVICE,
-                        acpi_id))) {
+
+    if (!xen_processor_present(acpi_id)) {
          pr_debug("CPU with ACPI ID %u is unavailable\n", acpi_id);
          return AE_OK;
      }
-    /* There are more ACPI Processor objects than in x2APIC or MADT.
-     * This can happen with incorrect ACPI SSDT declerations. */
-    if (acpi_id >= nr_acpi_bits) {
-        pr_debug("max acpi id %u, trying to set %u\n",
-             nr_acpi_bits - 1, acpi_id);
-        return AE_OK;
-    }
+
      /* OK, There is a ACPI Processor object */
      __set_bit(acpi_id, acpi_id_present);
      pr_debug("ACPI CPU%u w/ PBLK:0x%lx\n", acpi_id, (unsigned long)pblk);
+    if (!pr_initialized) {
+        struct acpi_processor *pr = context;
+        int rc;
+
+        /*
+         * There's no CPU on the system that has any performance or
+         * power related data, initialize all the required fields by
+         * fetching that info here.
+         *
+         * Note such information is only fetched once, and then reused
+         * for all pCPUs.  This won't work on heterogeneous systems
+         * with different Cx anb/or Px states between CPUs.
+         */
+
+        pr->handle = handle;
+
+        rc = acpi_processor_get_performance_info(pr);
+        if (rc)
+            pr_debug("ACPI CPU%u failed to get performance data\n",
+                 acpi_id);

Is it really normal to get a failure here? Shouldn't the reaction
be a little bit more visible in this case?

And can you just continue processing?

+        rc = xen_acpi_processor_evaluate_cst(handle, &pr->power);
+        if (rc)
+            pr_debug("ACPI CPU%u failed to get _CST data\n", acpi_id);

Same again. Is pr_debug() enough?

Thanks. I'll switch them to pr_err(). And I'll only set pr_initialized = true when both calls succeed.

Regards,
Jason