Re: [PATCH v2 2/5] cxl/hdm: Implement address translation for HDM decoding

From: Dan Williams
Date: Thu Jul 11 2024 - 21:03:32 EST


Robert Richter wrote:
> Default expectation of Linux is that HPA == SPA, which means that
> hardware addresses in the decoders are the same as the kernel sees
> them. However, there are platforms where this is not the case and an
> address translation between decoder's (HPA) and the system's physical
> addresses (SPA) is needed.
>
> The CXL driver stores all internal hardware address ranges as SPA.
> When accessing the HDM decoder's registers, hardware addresses must be
> translated back and forth. This is needed for the base addresses in
> the CXL Range Registers of the PCIe DVSEC for CXL Devices
> (CXL_DVSEC_RANGE_BASE*) or the CXL HDM Decoder Capability Structure
> (CXL_HDM_DECODER0_BASE*).
>
> To handle address translation the kernel needs to keep track of the
> system's base HPA the decoder bases on. The base can be different
> between memory domains, each port may have its own domain. Thus, the
> HPA base cannot be shared between CXL ports and its decoders, instead
> the base HPA must be stored per port. Each port has its own struct
> cxl_hdm to handle the sets of decoders and targets, that struct can
> also be used for storing the base.
>
> Add @base_hpa to struct cxl_hdm. Also Introduce helper functions for
> the translation and use them to convert the HDM decoder base addresses
> to or from an SPA.
>
> While at this, rename len to size for the common base/size naming used
> with ranges.
>
> Link: https://lore.kernel.org/all/65c6b8c9a42e4_d2d4294f1@xxxxxxxxxxxxxxxxxxxxxxxxx.notmuch/
> Suggested-by: Dan Williams <dan.j.williams@xxxxxxxxx>
> Signed-off-by: Robert Richter <rrichter@xxxxxxx>
> ---
> drivers/cxl/core/hdm.c | 69 ++++++++++++++++++++++++++++++++++--------
> drivers/cxl/cxlmem.h | 1 +
> 2 files changed, 57 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 605da9a55d89..50078013f4e3 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -125,6 +125,17 @@ static bool should_emulate_decoders(struct cxl_endpoint_dvsec_info *info)
> return true;
> }
>
> +static void setup_base_hpa(struct cxl_hdm *cxlhdm)
> +{
> + /*
> + * Address translation is not needed on platforms with HPA ==
> + * SPA. HDM decoder addresses all base on system addresses,
> + * there is no offset and the base is zero (cxlhdm->base_hpa
> + * == 0). Nothing to do here as it is already pre-initialized
> + * zero.
> + */
> +}

I am not grokking why this is per @cxlhdm? More on this concern below...

> + base = cxl_xlat_to_base(cxld->hpa_range.start, cxlhdm);

Would prefer this operation is called cxl_hpa_to_spa(), which is
different than the cxldrd->hpa_to_spa() in Alison's patches. This is
about an HPA domain per-host-bridge.

The cxlrd->hpa_to_spa() is after the address exits the host bridge there
is a translation to a Window interleave. Both of those are "SPAs" in my
mind.

> size = range_len(&cxld->hpa_range);
>
> writel(upper_32_bits(base), hdm + CXL_HDM_DECODER0_BASE_HIGH_OFFSET(id));
> @@ -746,22 +776,27 @@ static int cxl_setup_hdm_decoder_from_dvsec(
> struct cxl_port *port, struct cxl_decoder *cxld, u64 *dpa_base,
> int which, struct cxl_endpoint_dvsec_info *info)
> {
> + struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev);
> struct cxl_endpoint_decoder *cxled;
> - u64 len;
> + u64 base, size;

Please don't include renames like this s/@len/@size/ in the same patch
that changes some logic.

> int rc;
>
> if (!is_cxl_endpoint(port))
> return -EOPNOTSUPP;
>
> cxled = to_cxl_endpoint_decoder(&cxld->dev);
> - len = range_len(&info->dvsec_range[which]);
> - if (!len)
> + size = range_len(&info->dvsec_range[which]);
> + if (!size)
> return -ENOENT;
> + base = cxl_xlat_to_hpa(info->dvsec_range[which].start, cxlhdm);

Wait, the dvsec_range addresses are read from the registers, they are
already HPAs, shouldn't this be the "to SPA" flavor translation?

> cxld->target_type = CXL_DECODER_HOSTONLYMEM;
> cxld->commit = NULL;
> cxld->reset = NULL;
> - cxld->hpa_range = info->dvsec_range[which];
> + cxld->hpa_range = (struct range) {
> + .start = base,
> + .end = base + size -1,
> + };

I think it is confusing to change the software tracking of 'struct
cxl_decoder' to be in SPA, can this be kept as HPA tracking and then
fixup the locations that compare against SPAs, like CFWMS values and the
iomem_resource tree, to do the conversion?

>
> /*
> * Set the emulated decoder as locked pending additional support to
> @@ -770,14 +805,14 @@ static int cxl_setup_hdm_decoder_from_dvsec(
> cxld->flags |= CXL_DECODER_F_ENABLE | CXL_DECODER_F_LOCK;
> port->commit_end = cxld->id;
>
> - rc = devm_cxl_dpa_reserve(cxled, *dpa_base, len, 0);
> + rc = devm_cxl_dpa_reserve(cxled, *dpa_base, size, 0);
> if (rc) {
> dev_err(&port->dev,
> "decoder%d.%d: Failed to reserve DPA range %#llx - %#llx\n (%d)",
> - port->id, cxld->id, *dpa_base, *dpa_base + len - 1, rc);
> + port->id, cxld->id, *dpa_base, *dpa_base + size - 1, rc);
> return rc;
> }
> - *dpa_base += len;
> + *dpa_base += size;
> cxled->state = CXL_DECODER_STATE_AUTO;
>
> return 0;
> @@ -787,6 +822,7 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
> int *target_map, void __iomem *hdm, int which,
> u64 *dpa_base, struct cxl_endpoint_dvsec_info *info)
> {
> + struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev);
> struct cxl_endpoint_decoder *cxled = NULL;
> u64 size, base, skip, dpa_size, lo, hi;
> bool committed;
> @@ -823,6 +859,9 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
>
> if (info)
> cxled = to_cxl_endpoint_decoder(&cxld->dev);
> +
> + base = cxl_xlat_to_hpa(base, cxlhdm);
> +
> cxld->hpa_range = (struct range) {
> .start = base,
> .end = base + size - 1,
> @@ -1107,16 +1146,20 @@ int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm,
> }
>
> for (i = 0, allowed = 0; info->mem_enabled && i < info->ranges; i++) {
> - struct device *cxld_dev;
> -
> - cxld_dev = device_find_child(&root->dev, &info->dvsec_range[i],
> - dvsec_range_allowed);
> + u64 base = cxl_xlat_to_hpa(info->dvsec_range[i].start, cxlhdm);
> + u64 size = range_len(&info->dvsec_range[i]);
> + struct range hpa_range = {
> + .start = base,
> + .end = base + size -1,
> + };
> + struct device *cxld_dev __free(put_device) =
> + cxld_dev = device_find_child(&root->dev, &hpa_range,
> + dvsec_range_allowed);
> if (!cxld_dev) {
> dev_dbg(dev, "DVSEC Range%d denied by platform\n", i);
> continue;
> }
> dev_dbg(dev, "DVSEC Range%d allowed by platform\n", i);
> - put_device(cxld_dev);

Jarring to see this cleanup in the same patch. It deserves to be its own
standalone cleanup patch.

> allowed++;
> }
>
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 61d9f4e00921..849ea97385c9 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -856,6 +856,7 @@ struct cxl_hdm {
> unsigned int decoder_count;
> unsigned int target_count;
> unsigned int interleave_mask;
> + u64 base_hpa;

So, per the concern above, each @cxlhdm instance exists within a port
hierarchy. It is only the top of that port hiearchy that understands
that everything underneath is within it's own CXL HPA address domain.

So I would expect that only place this value needs to be stored is in
the cxl_port objects associated with host-bridges. The only place it
would need to be considered is when comparing iomem_resource and region
addresses with decoder addresses.

In other words, I think it is potentially mentally taxing to remember
that 'struct cxl_decoder' stores translated addresses vs teaching paths
that compare region address about the translation needed for endpoint
decoders.