Re: [PATCH v2 2/5] cxl/hdm: Implement address translation for HDM decoding
From: Robert Richter
Date: Sat Jul 13 2024 - 03:41:00 EST
On 11.07.24 18:03:13, Dan Williams wrote:
> Robert Richter wrote:
> > Default expectation of Linux is that HPA == SPA, which means that
> > hardware addresses in the decoders are the same as the kernel sees
> > them. However, there are platforms where this is not the case and an
> > address translation between decoder's (HPA) and the system's physical
> > addresses (SPA) is needed.
> >
> > The CXL driver stores all internal hardware address ranges as SPA.
> > When accessing the HDM decoder's registers, hardware addresses must be
> > translated back and forth. This is needed for the base addresses in
> > the CXL Range Registers of the PCIe DVSEC for CXL Devices
> > (CXL_DVSEC_RANGE_BASE*) or the CXL HDM Decoder Capability Structure
> > (CXL_HDM_DECODER0_BASE*).
> >
> > To handle address translation the kernel needs to keep track of the
> > system's base HPA the decoder bases on. The base can be different
> > between memory domains, each port may have its own domain. Thus, the
> > HPA base cannot be shared between CXL ports and its decoders, instead
> > the base HPA must be stored per port. Each port has its own struct
> > cxl_hdm to handle the sets of decoders and targets, that struct can
> > also be used for storing the base.
> >
> > Add @base_hpa to struct cxl_hdm. Also Introduce helper functions for
> > the translation and use them to convert the HDM decoder base addresses
> > to or from an SPA.
> >
> > While at this, rename len to size for the common base/size naming used
> > with ranges.
> >
> > Link: https://lore.kernel.org/all/65c6b8c9a42e4_d2d4294f1@xxxxxxxxxxxxxxxxxxxxxxxxx.notmuch/
> > Suggested-by: Dan Williams <dan.j.williams@xxxxxxxxx>
> > Signed-off-by: Robert Richter <rrichter@xxxxxxx>
> > ---
> > drivers/cxl/core/hdm.c | 69 ++++++++++++++++++++++++++++++++++--------
> > drivers/cxl/cxlmem.h | 1 +
> > 2 files changed, 57 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> > index 605da9a55d89..50078013f4e3 100644
> > --- a/drivers/cxl/core/hdm.c
> > +++ b/drivers/cxl/core/hdm.c
> > @@ -125,6 +125,17 @@ static bool should_emulate_decoders(struct cxl_endpoint_dvsec_info *info)
> > return true;
> > }
> >
> > +static void setup_base_hpa(struct cxl_hdm *cxlhdm)
> > +{
> > + /*
> > + * Address translation is not needed on platforms with HPA ==
> > + * SPA. HDM decoder addresses all base on system addresses,
> > + * there is no offset and the base is zero (cxlhdm->base_hpa
> > + * == 0). Nothing to do here as it is already pre-initialized
> > + * zero.
> > + */
> > +}
>
> I am not grokking why this is per @cxlhdm? More on this concern below...
Yes, I was going to solve only this single use case. Moving that down
to the decoders as you suggested below would better generalize this
and code could be reused for other implementations or platforms where
conversion is needed.
>
> > + base = cxl_xlat_to_base(cxld->hpa_range.start, cxlhdm);
>
> Would prefer this operation is called cxl_hpa_to_spa(), which is
> different than the cxldrd->hpa_to_spa() in Alison's patches. This is
> about an HPA domain per-host-bridge.
>
> The cxlrd->hpa_to_spa() is after the address exits the host bridge there
> is a translation to a Window interleave. Both of those are "SPAs" in my
> mind.
I am going to align this with Alison's patches.
>
> > size = range_len(&cxld->hpa_range);
> >
> > writel(upper_32_bits(base), hdm + CXL_HDM_DECODER0_BASE_HIGH_OFFSET(id));
> > @@ -746,22 +776,27 @@ static int cxl_setup_hdm_decoder_from_dvsec(
> > struct cxl_port *port, struct cxl_decoder *cxld, u64 *dpa_base,
> > int which, struct cxl_endpoint_dvsec_info *info)
> > {
> > + struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev);
> > struct cxl_endpoint_decoder *cxled;
> > - u64 len;
> > + u64 base, size;
>
> Please don't include renames like this s/@len/@size/ in the same patch
> that changes some logic.
I will separate all cleanups in this patch.
>
> > int rc;
> >
> > if (!is_cxl_endpoint(port))
> > return -EOPNOTSUPP;
> >
> > cxled = to_cxl_endpoint_decoder(&cxld->dev);
> > - len = range_len(&info->dvsec_range[which]);
> > - if (!len)
> > + size = range_len(&info->dvsec_range[which]);
> > + if (!size)
> > return -ENOENT;
> > + base = cxl_xlat_to_hpa(info->dvsec_range[which].start, cxlhdm);
>
> Wait, the dvsec_range addresses are read from the registers, they are
> already HPAs, shouldn't this be the "to SPA" flavor translation?
Yes, the naming is a little confusing at all.
The intention here was somehow "corrected" hpa, as it is later used in
cxld->"hpa"_range, and that is meant to be hpa == spa. That will
change anyway with a rework.
>
> > cxld->target_type = CXL_DECODER_HOSTONLYMEM;
> > cxld->commit = NULL;
> > cxld->reset = NULL;
> > - cxld->hpa_range = info->dvsec_range[which];
> > + cxld->hpa_range = (struct range) {
> > + .start = base,
> > + .end = base + size -1,
> > + };
>
> I think it is confusing to change the software tracking of 'struct
> cxl_decoder' to be in SPA, can this be kept as HPA tracking and then
> fixup the locations that compare against SPAs, like CFWMS values and the
> iomem_resource tree, to do the conversion?
In sysfs the addresses should be shown as SPA. If hpa_range is always
an SPA there is the advantage that only the access to the registers
needs translation. If that changes, the accessors of hpa_range need
conversion. Need to check impact of this.
Possibly we could maintain both ranges, e.g. another spa_range that
contains cached values with the translated addresses.
That said, regarding naming, an HPA is an address that can be
read/written from/to the HDM decoder cap regs or the range regs,
right?
>
> >
> > /*
> > * Set the emulated decoder as locked pending additional support to
> > @@ -770,14 +805,14 @@ static int cxl_setup_hdm_decoder_from_dvsec(
> > cxld->flags |= CXL_DECODER_F_ENABLE | CXL_DECODER_F_LOCK;
> > port->commit_end = cxld->id;
> >
> > - rc = devm_cxl_dpa_reserve(cxled, *dpa_base, len, 0);
> > + rc = devm_cxl_dpa_reserve(cxled, *dpa_base, size, 0);
> > if (rc) {
> > dev_err(&port->dev,
> > "decoder%d.%d: Failed to reserve DPA range %#llx - %#llx\n (%d)",
> > - port->id, cxld->id, *dpa_base, *dpa_base + len - 1, rc);
> > + port->id, cxld->id, *dpa_base, *dpa_base + size - 1, rc);
> > return rc;
> > }
> > - *dpa_base += len;
> > + *dpa_base += size;
> > cxled->state = CXL_DECODER_STATE_AUTO;
> >
> > return 0;
> > @@ -787,6 +822,7 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
> > int *target_map, void __iomem *hdm, int which,
> > u64 *dpa_base, struct cxl_endpoint_dvsec_info *info)
> > {
> > + struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev);
> > struct cxl_endpoint_decoder *cxled = NULL;
> > u64 size, base, skip, dpa_size, lo, hi;
> > bool committed;
> > @@ -823,6 +859,9 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
> >
> > if (info)
> > cxled = to_cxl_endpoint_decoder(&cxld->dev);
> > +
> > + base = cxl_xlat_to_hpa(base, cxlhdm);
> > +
> > cxld->hpa_range = (struct range) {
> > .start = base,
> > .end = base + size - 1,
> > @@ -1107,16 +1146,20 @@ int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm,
> > }
> >
> > for (i = 0, allowed = 0; info->mem_enabled && i < info->ranges; i++) {
> > - struct device *cxld_dev;
> > -
> > - cxld_dev = device_find_child(&root->dev, &info->dvsec_range[i],
> > - dvsec_range_allowed);
> > + u64 base = cxl_xlat_to_hpa(info->dvsec_range[i].start, cxlhdm);
> > + u64 size = range_len(&info->dvsec_range[i]);
> > + struct range hpa_range = {
> > + .start = base,
> > + .end = base + size -1,
> > + };
> > + struct device *cxld_dev __free(put_device) =
> > + cxld_dev = device_find_child(&root->dev, &hpa_range,
> > + dvsec_range_allowed);
> > if (!cxld_dev) {
> > dev_dbg(dev, "DVSEC Range%d denied by platform\n", i);
> > continue;
> > }
> > dev_dbg(dev, "DVSEC Range%d allowed by platform\n", i);
> > - put_device(cxld_dev);
>
> Jarring to see this cleanup in the same patch. It deserves to be its own
> standalone cleanup patch.
Ok.
>
> > allowed++;
> > }
> >
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index 61d9f4e00921..849ea97385c9 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -856,6 +856,7 @@ struct cxl_hdm {
> > unsigned int decoder_count;
> > unsigned int target_count;
> > unsigned int interleave_mask;
> > + u64 base_hpa;
>
> So, per the concern above, each @cxlhdm instance exists within a port
> hierarchy. It is only the top of that port hiearchy that understands
> that everything underneath is within it's own CXL HPA address domain.
>
> So I would expect that only place this value needs to be stored is in
> the cxl_port objects associated with host-bridges. The only place it
> would need to be considered is when comparing iomem_resource and region
> addresses with decoder addresses.
>
> In other words, I think it is potentially mentally taxing to remember
> that 'struct cxl_decoder' stores translated addresses vs teaching paths
> that compare region address about the translation needed for endpoint
> decoders.
Yes, that could be moved from HDM down to the decoders and translated
addresses being kept there. That approach is more flexible to also
solve other needs, such as XOR math translations or those of other
platforms.
So will look into this.
Thanks for review.
-Robert