Re: [LSF/MM] CXL Boot to Bash - Section 0: ACPI and Linux Resources
From: Gregory Price
Date: Thu Mar 06 2025 - 12:09:11 EST
On Thu, Mar 06, 2025 at 09:37:49AM +0800, Yuquan Wang wrote:
> On Wed, Mar 05, 2025 at 05:20:52PM -0500, Gregory Price wrote:
First, thank you for bringing this up, this is exactly the type of
ambiguiuty i was hoping others would contribute. It's difficult to
figure out if the ACPI tables are "Correct", if there's unimplemented
features, or we're doing something wrong - because some of this is
undocumented theory of operation.
> > ==================
> > NUMA node creation
> > ===================
> > NUMA nodes are *NOT* hot-pluggable. All *POSSIBLE* NUMA nodes are
> > identified at `__init` time, more specifically during `mm_init`.
> >
> > What this means is that the CEDT and SRAT must contain sufficient
> > `proximity domain` information for linux to identify how many NUMA
> > nodes are required (and what memory regions to associate with them).
> >
> Condition:
> 1) A UMA/NUMA system that SRAT is absence, but it keeps CEDT.CFMWS
> 2)Enable CONFIG_ACPI_NUMA
>
> Results:
> 1) acpi_numa_init: the fake_pxm will be 0 and send to acpi_parse_cfmws()
> 2)If dynamically create cxl ram region, the cxl memory would be assigned
> to node0 rather than a fake new node.
>
This is very interesting. Can I ask a few questions:
1) is this real hardware or a VM?
2) By `dynamic creation` you mean leveraging cxl-cli (ndctl)?
2a) Is the BIOS programming decoders, or are you programming the
decoder after boot?
> Confusions:
> 1) Does CXL memory usage require a numa system with SRAT? As you
> mentioned in SRAT section:
>
> "This table is technically optional, but for performance information
> to be enumerated by linux it must be present."
>
> Hence, as I understand it, it seems a bug in kernel.
>
It's hard to say if this is a bug yet. It's either a bug, or your
system should have an SRAT to describe what the BIOS has done.
> 2) If it is a bug, could we forbid this situation by adding fake_pxm
> check and returning error in acpi_numa_init()?
>
> 3)If not, maybe we can add some kernel logic to allow create these fake
> nodes on a system without SRAT?
>
I think we should at least provide a warning (if the SRAT is expected
but missing) - but lets get some more information first.
~Gregory