Re: [PATCH 2/3] cxl/core: Enable Region creation on x86 with Low Memory Hole

From: Li Ming
Date: Mon Nov 25 2024 - 21:14:01 EST




On 11/26/2024 1:22 AM, Fabio M. De Francesco wrote:
> On Monday, November 25, 2024 9:42:56 AM GMT+1 Li Ming wrote:
>>
>> On 11/22/2024 11:51 PM, Fabio M. De Francesco wrote:
>>> The CXL Fixed Memory Window Structure (CFMWS) describes zero or more Host
>>> Physical Address (HPA) windows that are associated with each CXL Host
>>> Bridge. Each window represents a contiguous HPA that may be interleaved
>>> with one or more targets (CXL v3.1 - 9.18.1.3).
>>>
>>> The Low Memory Hole (LMH) of x86 is a range of addresses of physical low
>>> memory to which systems cannot send transactions. In some cases the size
>>> of that hole is not compatible with the CXL hardware decoder constraint
>>> that the size is always aligned to 256M * Interleave Ways.
>>>
>>> On those systems, BIOS publishes CFMWS which communicate the active System
>>> Physical Address (SPA) ranges that map to a subset of the Host Physical
>>> Address (HPA) ranges. The SPA range trims out the hole, and capacity in
>>> the endpoint is lost with no SPA to map to CXL HPA in that hole.
>>>
>>> In the early stages of CXL Regions construction and attach on platforms
>>> with Low Memory Holes, cxl_add_to_region() fails and returns an error
>>> because it can't find any CXL Window that matches a given CXL Endpoint
>>> Decoder.
>>>
>>> Detect Low Memory Holes by comparing Root Decoders and Endpoint Decoders
>>> ranges: both must start at physical address 0 and end below 4 GB, while
>>> the Root Decoder ranges end at lower addresses than the matching Endpoint
>>> Decoders which instead must always respect the above-mentioned CXL hardware
>>> decoders HPA alignment constraint.
>>
>> Hi Fabio,
>>
>> Here mentions that the end address must be below 4GB, but I don't find any checking about that in the implementation, is it not needed?
>>
> Hi Ming,
>
> Good question, thanks!
>
> While a first version of arch_match_spa() had a check of 'r2->end < SZ_4G',
> I dropped it for the final one. It ended up out of sync with the commit message.
>
> I think that we don't want that check. I'll rework the commit message for v2.
>
> If the hardware decoders HPA ranges extended beyond the end of Low
> Memory, the LMH would still need to be detected and the decoders capacity
> would still need to be trimmed to match their corresponding CFMWS range end.
>>
>> [snip]
>>
>>
>>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>>> index ac2c486c16e9..3cb5a69e9731 100644
>>> --- a/drivers/cxl/core/region.c
>>> +++ b/drivers/cxl/core/region.c
>>> @@ -836,8 +836,12 @@ static int match_auto_decoder(struct device *dev, void *data)
>>> cxld = to_cxl_decoder(dev);
>>> r = &cxld->hpa_range;
>>>
>>> - if (p->res && p->res->start == r->start && p->res->end == r->end)
>>> - return 1;
>>> + if (p->res) {
>>> + if (p->res->start == r->start && p->res->end == r->end)
>>> + return 1;
>>> + if (arch_match_region(p, cxld))
>>> + return 1;
>>> + }
>>
>> I think that it is better to check LMH cases before checking (p->res->start == r->start && p->res->end == r->end).
>> Per the changelog and the implementation of arch_match_region(), below case should fails but current checking will succeed:
>> the value of 'p->res->start' is MISALIGNED_CFMWS_RANGE_BASE and the the value of 'p->res->end' is equals to the value of 'r->end'.
>>
> I think that the expected "normal" case should always come first. In the expected
> scenarios the driver deals either with SPA range == HPA range
> (e.g, in match_auto_decoder()) or with SPA range that fully contains the HPA range
> (match_decoder_by_range()).
>
> If either one of those two cases holds, arch_match_*() helper doesn't need to be
> called and the match must succeed.
>
> Besides, other architectures are free to define holes with many constraints that
> the driver doesn't want to check first if the expected case is already met.
>>
>>>
>>> return 0;
>>> }
>>> @@ -1414,7 +1418,8 @@ static int cxl_port_setup_targets(struct cxl_port *port,
>>> if (cxld->interleave_ways != iw ||
>>> cxld->interleave_granularity != ig ||
>>> cxld->hpa_range.start != p->res->start ||
>>> - cxld->hpa_range.end != p->res->end ||
>>> + (cxld->hpa_range.end != p->res->end &&
>>> + !arch_match_region(p, cxld)) ||
>>> ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) {
>>> dev_err(&cxlr->dev,
>>> "%s:%s %s expected iw: %d ig: %d %pr\n",
>>> @@ -1726,6 +1731,7 @@ static int match_switch_decoder_by_range(struct device *dev, void *data)
>>> {
>>> struct cxl_endpoint_decoder *cxled = data;
>>> struct cxl_switch_decoder *cxlsd;
>>> + struct cxl_root_decoder *cxlrd;
>>> struct range *r1, *r2;
>>>
>>> if (!is_switch_decoder(dev))
>>> @@ -1735,8 +1741,13 @@ static int match_switch_decoder_by_range(struct device *dev, void *data)
>>> r1 = &cxlsd->cxld.hpa_range;
>>> r2 = &cxled->cxld.hpa_range;
>>>
>>> - if (is_root_decoder(dev))
>>> - return range_contains(r1, r2);
>>> + if (is_root_decoder(dev)) {
>>> + if (range_contains(r1, r2))
>>> + return 1;
>>> + cxlrd = to_cxl_root_decoder(dev);
>>> + if (arch_match_spa(cxlrd, cxled))
>>> + return 1;
>>
>> Same as above, what will happen if the root decoder's address range still contains endpoint decoder's address range in LMH cases? should fails or succeed?
>>
> If r1 contains r2, there is no LMH and the driver is dealing with the regular,
> expected, case. It must succeed.
>
> Think of the arch_match_*() helpers like something that avoid unwanted
> failures if arch permits exceptions. Before returning errors when the "normal"
> tests fail, check if the arch allows any exceptions and make the driver
> ignore that "strange" SPA/HPA misalignment.
>>
>> [snip]
>>
> Thanks,
>
> Fabio
>
>

Understand it, thanks for your explanation.

Ming

>
>
>