Re: [PATCH v4 1/6] x86/resctrl: Parse ACPI ERDT table and map RMDD domains by L3 cache ID

From: Chen, Yu C

Date: Mon Jun 22 2026 - 05:26:59 EST


Hi Reinette,
Thanks very much for the detailed review.

On 6/19/2026 7:37 AM, Reinette Chatre wrote:
Hi Chenyu,

On 6/13/26 12:56 AM, Chen Yu wrote:
From: Anil S Keshavamurthy <anil.s.keshavamurthy@xxxxxxxxx>

ERDT(Enhanced RDT) introduces a new top-level ACPI structure

(nit: please add space between ERDT and the "(")


OK, will do.

(the ERDT) that the kernel must parse before any enhanced
RDT feature can be used. The ERDT improves the existing RDT
by switching low-level register access from MSR-based to
MMIO-based, which is more efficient.

(nit: please use full line length available for changelog)


OK, will do.

Above states clearly "switching low-level register access from MSR-based to
MMIO-based". This implementation clearly does so ... but the new ACPI tables
supporting the MMIO-based access also include, for example, the
maximum RMIDs that determines how many MMIO registers are supported.
Even so, from what I can tell, the "maximum RMID" that the ACPI tables
report is ignored and this implementation instead continues to rely on
the maximum RMID reported via CPUID that indicates the maximum RMIDs supported
by the *MSR* to also guide the access of the MMIO registers.

This implementation thus looks to straddle some middle ground between
ACPI and CPUID enumeration, using return of one to guide access to the
other that does not seem robust and requires platforms to forever support both.



I see. If we rely on ERDT information, we should use the "maximum RMID"
exposed via ACPI tables instead of CPUID. I will adjust this accordingly.

memory. As a trade-off, an xarray is used to store these tables, with

(nit: needs imperative eg. "use an xarray ...")


OK, will change it accordingly.

the L3 cache ID as the key.
arch/x86/Kconfig | 4 +-
arch/x86/include/asm/apic.h | 1 +
arch/x86/include/asm/resctrl.h | 2 +
arch/x86/kernel/cpu/resctrl/Makefile | 1 +
arch/x86/kernel/cpu/resctrl/core.c | 14 +-
arch/x86/kernel/cpu/resctrl/erdt.c | 296 +++++++++++++++++++++++++
arch/x86/kernel/cpu/resctrl/internal.h | 3 +
arch/x86/kernel/cpu/topology.c | 2 +-

Please split out the non-resctrl changes into separate patch with sybsystem-specific
prefix to highlight these changes to not appear to sneak changes into other
subsystems via resctrl. They can stil form part of this series as preparatory patches.


I see, let me split two of them out:
x86/topology: Export topo_lookup_cpuid for module use
x86/apic: Add declaration for topo_lookup_cpuid


config X86_CPU_RESCTRL
bool "x86 CPU resource control support"
- depends on X86 && (CPU_SUP_INTEL || CPU_SUP_AMD)
+ depends on X86_64 && (CPU_SUP_INTEL || CPU_SUP_AMD)
depends on MISC_FILESYSTEMS
select ARCH_HAS_CPU_RESCTRL
select RESCTRL_FS
@@ -538,7 +538,7 @@ config X86_CPU_RESCTRL
config X86_CPU_RESCTRL_INTEL_AET
bool "Intel Application Energy Telemetry"
- depends on X86_64 && X86_CPU_RESCTRL && CPU_SUP_INTEL && INTEL_PMT_TELEMETRY=y && INTEL_TPMI=y
+ depends on X86_CPU_RESCTRL && CPU_SUP_INTEL && INTEL_PMT_TELEMETRY=y && INTEL_TPMI=y
help
Enable per-RMID telemetry events in resctrl.

Removing 32bit support is a significant change worth a mention and motivation in the changelog.


The original idea was that MMIO-based access (including CMT/MBM/MBA) uses readq/writeq,
which depend on 64-bit. Since MMIO CMT relies on CONFIG_X86_CPU_RESCTRL, making
CONFIG_X86_CPU_RESCTRL depend on X86_64 would simplify the code.
If I understand correctly, 32-bit x86 lacks practical hardware support for RDT/QoS.
The Kconfig entry currently depends on X86, which technically permits 32-bit builds.
However, I am unsure whether there exists any real-world use case where a user compiles
a 32-bit kernel on a server and enables resctrl?
If we wish to retain 32-bit RDT support, I can introduce a new config option to gate ERDT:
config X86_CPU_RESCTRL_ERDT
bool
depends on X86_CPU_RESCTRL && X86_64 && ACPI
default y
Could you advise which approach is preferable?


diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index 575f8408a9e7..97c2f6bc7a5f 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -40,6 +40,8 @@ struct resctrl_pqr_state {
u32 default_closid;
};
+bool erdt_enabled(void);
+

I cannot see why this needs to be in asm header ... why not arch/x86/kernel/cpu/resctrl/internal.h?


OK, let me move it into internal.h. By the way, I believe fs/resctrl
also needs to call erdt_enabled() for region-aware MBA/MBM. Would it
be acceptable to expose erdt_enabled() in include/linux/resctrl.h?

+ * Enhanced Resource Director Technology(ERDT)

(nit: please add space)


OK, will do.


+#include <linux/cleanup.h>
+#include <linux/cpu.h>
+#include <linux/err.h>
+#include <linux/xarray.h>
+#include <linux/resctrl.h>
+#include <linux/acpi.h>

Please sort the includes alphabetically and add empty lines between the groups of headers.


OK, will do.

+#include <asm/apic.h>
+#include <asm/cpu_device_id.h>

Why is above #include needed?


It was included in the previous version for CPU count queries,
but it is no longer needed; I will remove it.

+#include "internal.h"
+


Please add a comment for the enum below to describe what the different members represent.


OK, will do.

+enum erdt_mmio_type {
+ ERDT_MMIO_RMDD_CREG,
+ ERDT_MMIO_CMRC_BASE,
+ ERDT_MMIO_MAX
+};

Depending on how this enum ends up being used, could it benefit from a pattern like
https://lore.kernel.org/lkml/4a304e6c9ed3f5d92156b8045f5dd4ddba359dfc.1777419024.git.reinette.chatre@intel ?


Got it, thanks for sharing this, will do.

+
+struct erdt_domain_info {
+ void __iomem *base[ERDT_MMIO_MAX];
+ struct acpi_erdt_cmrc *cmrc;
+};
+
+static bool erdt_enabled_flag;

What is significance of "flag"? Can it just be "erdt_enabled"? Since it is a bool this seems
more intuitive?


erdt_enabled() was already used as a function name. Let me change it to static bool is_enabled

+
+static DEFINE_XARRAY(erdt_domain_xa);

Please add comment to describe what is stored in erdt_domain_xa and what the
index represents.


OK, will do.

+
+#define ERDT_VALID_VERSION 1
+#define RMDD_FLAG_CPU_DOMAIN BIT(0)

I think this name can be more accurate, how about, for example "RMDD_FLAG_L3_DOMAIN"?


According to the spec, this flag indicates: CPU L3 cache and I/O L3 Domain, maybe using
RMDD_FLAG_CPU_L3_DOMAIN?

+
+static u32 valid_subtbl_mask;

Please add comment to describe what above global represents.


OK, will do.

+
+bool erdt_enabled(void)
+{
+ return erdt_enabled_flag;
+}
+
+/**
+ * get_l3_cache_id_from_cacd - Resolve L3 cache ID from CACD subtable

As I interpret it ... above does not describe to be what this function does.

+ * @cacd: Pointer to the ACPI ERDT CACD structure
+ *
+ * Parses the X2APIC ID list in the given CACD subtable to
+ * identify an online logical CPU and uses it to query the associated
+ * L3 cache ID. The first valid CPU found is used for this lookup.

This function is __init and requires at least one CPU per domain to be online.
What will happen if a system is booted with all CPUs of a domain offline and
then later a CPU is online'd?


Got it, this appears to be a corner case bug: offlined CPUs within a domain never
get an opportunity to retrieve their L3 ID. I will figure out how to handle this
case, potentially by leveraging resctrl_arch_online_cpu() (or maybe use other
method)

+ *
+ * The L3 cache ID is used as a unique domain key for ERDT domain
+ * registration and lookup.
+ *
+ * Return: L3 cache ID for the first matching CPU, or -1 on failure.

Could you please use consistent terminology? Above comments switches between
"online logical CPU" and "valid CPU" and "matching CPU" that all seem to refer
to the same idea.


OK, let me switch to "CPU" in all cases.

+ */
+static __init int get_l3_cache_id_from_cacd(struct acpi_erdt_cacd *cacd)
+{
+ int num_ids, cpu, online_cpu = -1, cache_id = -1, tmp;
+ struct cacheinfo *ci;
+
+ if (cacd->header.length < sizeof(*cacd) + sizeof(cacd->X2APICIDS[0])) {

Is above an open code of struct_size()?


OK, will change it to
if (cacd->header.length < struct_size(cacd, X2APICIDS, 1))

+ pr_warn(FW_BUG "Invalid x2apicid CACD table\n");

This patch displays several "FW_BUG" messages but they are sometimes printed with
pr_info() and sometimes with pr_warn(). Is this intentional?


Let me switch all the print to pr_warn(FW_BUG).

+ return -1;
+ }
+
+ num_ids = (cacd->header.length - sizeof(*cacd)) / sizeof(cacd->X2APICIDS[0]);
+
+ guard(cpus_read_lock)();
+
+ for (int i = 0; i < num_ids; i++) {
+ cpu = topo_lookup_cpuid(cacd->X2APICIDS[i]);
+ if (cpu < 0) {
+ pr_warn(FW_BUG "Unknown x2apicid 0x%x\n", cacd->X2APICIDS[i]);
+
(nit: unnecessary empty line)



OK, will remove it.

+ return -1;
+ }
+
+ if (!cpu_online(cpu))
+ continue;
+
+ tmp = get_cpu_cacheinfo_id(cpu, RESCTRL_L3_CACHE);

From what I can tell it is tmp that is essentially returned by this function and it
is the result of get_cpu_cacheinfo_id() that returns the cache ID initialized from
CPUID leaf 0x4 ... the function name ("get_l3_cache_id_from_cacd()") and function comment
"Resolve L3 cache ID from CACD subtable" implies differently and the way this is done
makes me wonder why CACD needs to be parsed at all.

resctrl already has a hotplug handler and it dynamically determines the domain ID
based on the cache ID returned from get_cpu_cacheinfo_id(). If the domain ID used by the
new ERDT tables can be, as it appears is done here, trusted to what Linux already
treating as domain ID then why is it needed to parse CACD at all? Why not just
pick the domain ID from RMDD directly and use it directly as domain ID?

Later the domain ID from RMDD is just used in error messages ... never actually used
as a domain ID to guide the implementation so the correlation appears to be implicit with
this CACD parsing not intuitive.


You are right that the cache ID ultimately comes from get_cpu_cacheinfo_id(),
not from CACD directly - the function name is misleading and should be improved.

However, my understanding is that CACD parsing acts as the bridge between ACPI's
RMDD domain namespace and the kernel's L3 domain ID: CACD enumerates which CPUs
belong to each RMDD, and from those CPUs we derive the kernel-side L3 cache ID.
The specification defines an RMDD DomainID as an identifier unique within the
ERDT table, yet it does not guarantee that this ID matches the kernel’s cache ID
obtained via CPUID leaf 0x4. For this reason, we omitted the RMDD domain ID in
the current release.


+static void __iomem *erdt_ioremap_checked(phys_addr_t base, u32 size, const char *desc)

"size" -> "num_pages"?


OK, will change the name.

+{
+ void __iomem *addr = ioremap(base, size << 12);

The 12 magic constant is not ideal. Seems like PAGE_SHIFT is avoided since this *has* to be
4K? Would "num_pages * SZ_4K" be more accurate? To be most robust please consider check_mul_overflow()
or check_shl_overflow().


OK, will change it to use SZ_4K and check the overflow.

It is not clear to me what the "_checked" of the function name implies but I do not see
the parameters checked. For example, base could be checked for 4KB alignment?


The _checked suffix was intended to perform error checking for ioremap, not the 4KB
alignment check. I will remove this suffix.

+
+ if (!addr) {
+ pr_err("ERDT: Failed to map %s at phys addr %#llx (size: %u pages)\n",

phys_addt_t is printed with %pa


OK, will do.

+ domain_info = kzalloc(sizeof(*domain_info), GFP_KERNEL);

Please use kzalloc_obj().
commit 2932ba8d9c99 ("slab: Introduce kmalloc_obj() and family") contains details about benefits.


OK, thanks for sharing, will do.

+ if (!domain_info)
+ return false;
+
+ domain_info->base[ERDT_MMIO_RMDD_CREG] =
+ erdt_ioremap_checked(rmdd->creg_base, rmdd->creg_size, "RMDD ctrl base");
+ if (!domain_info->base[ERDT_MMIO_RMDD_CREG])
+ goto cleanup;
+
+ for (subtbl = rmdd_subtbl(rmdd);
+ subtbl_valid((void *)rmdd + rmdd->header.length, subtbl);

This does not look safe to me. What will happen if RMDD is empty? Would subtbl not
point to end of the table and then subtbl_valid() would dereference pointer beyond the
end of the table to do its checks?


I see. I will switch to the following version and add comments to clarify this
check:
static inline bool subtbl_valid(void *end, struct acpi_subtbl_hdr_16 *subtbl)
{
/* Ensure the header is within bounds before dereferencing it. */
if ((void *)subtbl + sizeof(*subtbl) > end)
return false;

/* A sub-table must be at least as large as its header. */
if (subtbl->length < sizeof(*subtbl))
return false;

/* The entire sub-table (including body) must fit within the parent. */
if ((void *)subtbl + subtbl->length > end)
return false;

return true;
}

+static void erdt_cleanup(void)
+{
+ struct erdt_domain_info *d;
+ unsigned long index;
+

Looks like the initialization via get_rdt_resources() does not do any error
checking so there may be nothing to clean up and certainly there will be nothing
to clean up if ERDT is not supported. Can the cleanup have a short circuit here to
only clean up if the array is not empty or perhaps skip cleanup if !erdt_enabled?


OK, let me add !erdt_enabled in erdt_cleanup().

+ xa_for_each(&erdt_domain_xa, index, d)
+ cleanup_one_domain(d);
+ xa_destroy(&erdt_domain_xa);
+}
+
+static __init int enumerate_erdt_table(struct acpi_table_header *table_hdr)
+{
+ struct acpi_table_erdt *erdt = (struct acpi_table_erdt *)table_hdr;
+ struct acpi_subtbl_hdr_16 *subtbl;
+ void *table_end;
+
+ if (erdt->header.revision != ERDT_VALID_VERSION) {
+ pr_info("Unknown ERDT table revision %d\n", erdt->header.revision);
(nit: "Unknown" -> "Unsupported"?)


OK, will change it.

+ return -EINVAL;
+ }
+
+ if (erdt->header.length < sizeof(*erdt)) {
+ pr_info(FW_BUG "ERDT: Invalid table length %u\n", erdt->header.length);

Here is an example of a "pr_info()" FW_BUG. It may help to add a unit
and not just report "length"


OK, will use pr_warn() and add a "bytes".

+void erdt_exit(void)
+{
+ erdt_cleanup();
+}

Why is erdt_exit() and erdt_cleanup() both needed?


Previously erdt_cleanup() was supposed to be used internally by erdt.c,
and erdt_exit() was exposed for non-arch code. But yes, we can use only
one, let me change it.

thanks,
Chenyu