Re: [PATCH v2 10/15] cxl/region: Use root decoders interleaving parameters to create a region

From: Gregory Price
Date: Mon Mar 31 2025 - 21:59:59 EST


On Tue, Feb 18, 2025 at 02:23:51PM +0100, Robert Richter wrote:
> @@ -1955,12 +1971,23 @@ static int cxl_port_calc_interleave(struct cxl_port *port,
> if (is_cxl_root(port))
> return 0;
>
> - rc = find_pos_and_ways(port, ctx->hpa_range, &parent_pos, &parent_ways);
> + rc = find_pos_and_ways(port, ctx->hpa_range, &parent_pos, &parent_ways,
> + &parent_granularity);
> if (rc)
> return rc;
>
> ctx->pos = ctx->pos * parent_ways + parent_pos;
>
> + if (ctx->interleave_ways)
> + ctx->interleave_ways *= parent_ways;
> + else
> + ctx->interleave_ways = parent_ways;
> +
> + if (ctx->interleave_granularity)
> + ctx->interleave_granularity *= ctx->interleave_ways;
> + else
> + ctx->interleave_granularity = parent_granularity;
> +
> return ctx->pos;
> }
>

I have discovered on my Zen5 that either this code is incorrect, or my
decoders are programmed incorrectly.

decoderN.M | ig iw
----------------------
decoder0.0 | 2 256
decoder1.0 | 1 256
decoder3.0 | 1 256
decoder5.0 | 1 256
decoder6.0 | 1 256
region0 | 2 512 <--- Wrong

*Arch quirk aside*, everything except region is as expected.


I finally dropped a bunch of hacks from my branch, and my Zen5 stopped
bringing devices up correctly, with the error:

[]cxl region0: pci0000:d2:port1 cxl_port_setup_targets expected
iw: 1 ig: 1024 [... snip ...]
[]cxl region0: pci0000:d2:port1 cxl_port_setup_targets got
iw: 1 ig: 256 [... snip ...]

Sitting here scratching my head how I could possibly end up with an ig
of 1024 with the above set of decoders when I realized the region
inherited interleave_ways/granularity from the ENDPOINT decoder, which
is not exposed to sysfs.

Had to come back around to realize this patch set adds new
ways/granularity fields to the endpoint decoder.

struct cxl_endpoint_decoder {
struct cxl_decoder cxld;
...
int interleave_ways;
int interleave_granularity;
}

struct cxl_decoder {
...
int interleave_ways;
int interleave_granularity;
}


1) the cxl_endpoint_decoder descriptor needs to be updated to explain
why these ways/granularity differ from the cxl_decoder inside of the
cxl_endpoint_decoder. This is very, very confusing.

The reason appears to be that the endpoint decoder ways/granularity
is the region ways/granularity. So the endpoint decoder is passing
this information along.

Makes me think the region creation code should call this directly,
rather than all this indirection.


2) This calculation appears to just be plain wrong.


static int cxl_endpoint_decoder_initialize(struct cxl_endpoint_decoder *cxled)
{
ctx = (struct cxl_interleave_context) {
.hpa_range = &hpa,
};
...
while (iter && parent) {
endpoint host bridge root
decoder6.0 -> decoder3.0 -> decoder0.0

/* Convert interleave settings to next port upstream. */
rc = cxl_port_calc_interleave(iter, &ctx);
...
}
...
cxled->interleave_ways = ctx.interleave_ways;
cxled->interleave_granularity = ctx.interleave_granularity;
}

On my setup, I would expect to iterate decoder3.0 and decoder0.0
decoderN.M | ig iw
----------------------
decoder0.0 | 2 256
decoder3.0 | 1 256

on entry [iw,ig] = [0,0]
[parent_ways, parent_gran] -> [1,256]
[iw * piw, ig * piw] -> [2,512]


Looking at a normal system, we'd expect this configuration:

decoderN.M | ig iw
----------------------
decoder0.0 | 2 256
decoder1.0 | 1 512
decoder3.0 | 1 512
decoder5.0 | 2 256
decoder6.0 | 2 256

The above code produces the following:
[1,512]
[2,1024] <--- still wrong


in cxl_port_setup_targets we have this comment:

if (is_cxl_root(parent_port)) {
/*
* Root decoder IG is always set to value in CFMWS which
* may be different than this region's IG. We can use the
* region's IG here since interleave_granularity_store()
* does not allow interleaved host-bridges with
* root IG != region IG.
*/
parent_ig = p->interleave_granularity;
parent_iw = cxlrd->cxlsd.cxld.interleave_ways;
}


Can we not just always report the parent ways/granularity, and skip all
the math? We'll always iterate to the root, and that's what we want the
region to match, right?

Better yet, can we not just do this in the region creation code, rather
than having the endpoint carry it through to the region for some reason?
Avoid adding the duplicate ways/granularity field all together.

~Gregory