Re: [PATCH v2 3/8] cxl/edac: Add CXL memory device patrol scrub control feature
From: Dan Williams
Date: Wed Mar 26 2025 - 21:51:58 EST
shiju.jose@ wrote:
> From: Shiju Jose <shiju.jose@xxxxxxxxxx>
>
> CXL spec 3.2 section 8.2.10.9.11.1 describes the device patrol scrub
> control feature. The device patrol scrub proactively locates and makes
> corrections to errors in regular cycle.
>
> Allow specifying the number of hours within which the patrol scrub must be
> completed, subject to minimum and maximum limits reported by the device.
> Also allow disabling scrub allowing trade-off error rates against
> performance.
>
> Add support for patrol scrub control on CXL memory devices.
> Register with the EDAC device driver, which retrieves the scrub attribute
> descriptors from EDAC scrub and exposes the sysfs scrub control attributes
> to userspace. For example, scrub control for the CXL memory device
> "cxl_mem0" is exposed in /sys/bus/edac/devices/cxl_mem0/scrubX/.
>
> Additionally, add support for region-based CXL memory patrol scrub control.
> CXL memory regions may be interleaved across one or more CXL memory
> devices. For example, region-based scrub control for "cxl_region1" is
> exposed in /sys/bus/edac/devices/cxl_region1/scrubX/.
>
> Reviewed-by: Dave Jiang <dave.jiang@xxxxxxxxx>
> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
> Signed-off-by: Shiju Jose <shiju.jose@xxxxxxxxxx>
> ---
> drivers/cxl/Kconfig | 25 ++
> drivers/cxl/core/Makefile | 1 +
> drivers/cxl/core/edac.c | 474 ++++++++++++++++++++++++++++++++++++++
> drivers/cxl/core/region.c | 5 +
> drivers/cxl/cxlmem.h | 10 +
> drivers/cxl/mem.c | 4 +
> 6 files changed, 519 insertions(+)
> create mode 100644 drivers/cxl/core/edac.c
>
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 205547e5543a..b5ede1308425 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -113,6 +113,31 @@ config CXL_FEATURES
>
> If unsure say 'n'
>
> +config CXL_EDAC_MEM_FEATURES
> + bool "CXL: EDAC Memory Features"
> + depends on EXPERT
> + depends on CXL_MEM
> + depends on CXL_FEATURES
> + depends on EDAC >= CXL_BUS
> + depends on EDAC_SCRUB
> + help
> + The CXL EDAC memory feature control is optional and allows host
> + to control the EDAC memory features configurations of CXL memory
> + expander devices.
> +
> + When enabled 'cxl_mem' and 'cxl_region' EDAC devices are published
> + with memory scrub control attributes as described by
> + Documentation/ABI/testing/sysfs-edac-scrub.
> +
> + When enabled 'cxl_mem' EDAC devices are published with memory ECS
> + and repair control attributes as described by
> + Documentation/ABI/testing/sysfs-edac-ecs and
> + Documentation/ABI/testing/sysfs-edac-memory-repair respectively.
> +
> + Say 'y/m' if you have an expert need to change default settings
> + of a memory RAS feature established by the platform/device (eg.
> + scrub rates for the patrol scrub feature). otherwise say 'n'.
> +
> config CXL_PORT
> default CXL_BUS
> tristate
> diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
> index 139b349b3a52..9b86fb22e5de 100644
> --- a/drivers/cxl/core/Makefile
> +++ b/drivers/cxl/core/Makefile
> @@ -19,4 +19,5 @@ cxl_core-y += ras.o
> cxl_core-$(CONFIG_TRACING) += trace.o
> cxl_core-$(CONFIG_CXL_REGION) += region.o
> cxl_core-$(CONFIG_CXL_FEATURES) += features.o
> +cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += edac.o
> cxl_core-$(CONFIG_CXL_MCE) += mce.o
> diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c
> new file mode 100644
> index 000000000000..5ec3535785e1
> --- /dev/null
> +++ b/drivers/cxl/core/edac.c
> @@ -0,0 +1,474 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * CXL EDAC memory feature driver.
> + *
> + * Copyright (c) 2024-2025 HiSilicon Limited.
> + *
> + * - Supports functions to configure EDAC features of the
> + * CXL memory devices.
> + * - Registers with the EDAC device subsystem driver to expose
> + * the features sysfs attributes to the user for configuring
> + * CXL memory RAS feature.
> + */
> +
> +#include <linux/cleanup.h>
> +#include <linux/edac.h>
> +#include <linux/limits.h>
> +#include <cxl/features.h>
> +#include <cxl.h>
> +#include <cxlmem.h>
> +#include "core.h"
> +
> +#define CXL_NR_EDAC_DEV_FEATURES 1
> +
> +static struct rw_semaphore *cxl_acquire(struct rw_semaphore *rwsem)
> +{
> + if (down_read_interruptible(rwsem))
> + return NULL;
> +
> + return rwsem;
> +}
> +
> +DEFINE_FREE(cxl_unlock, struct rw_semaphore *, if (_T) up_read(_T))
I know I suggested cxl_acquire() and cxl_unlock(), but this really is a
generic facility.
Let's call it rwsem_read_intr_acquire() and rwsem_read_release(), and
I'll follow up later with Peter to see if he wants this to graduate from
CXL.
Also, go ahead and define it in cxl.h for now as I think other places in
the subsystem could benefit from this approach.
> +
> +/*
> + * CXL memory patrol scrub control
> + */
> +struct cxl_patrol_scrub_context {
I like "patrol_scrub" spelled out here compared to "ps" used everywhere
else.
> + u8 instance;
> + u16 get_feat_size;
> + u16 set_feat_size;
> + u8 get_version;
> + u8 set_version;
> + u16 effects;
> + struct cxl_memdev *cxlmd;
> + struct cxl_region *cxlr;
> +};
> +
> +/**
> + * struct cxl_memdev_ps_params - CXL memory patrol scrub parameter data structure.
> + * @enable: [IN & OUT] enable(1)/disable(0) patrol scrub.
> + * @scrub_cycle_changeable: [OUT] scrub cycle attribute of patrol scrub is changeable.
> + * @scrub_cycle_hrs: [IN] Requested patrol scrub cycle in hours.
> + * [OUT] Current patrol scrub cycle in hours.
> + * @min_scrub_cycle_hrs:[OUT] minimum patrol scrub cycle in hours supported.
> + */
> +struct cxl_memdev_ps_params {
> + bool enable;
> + bool scrub_cycle_changeable;
This is set but unused. Even if it were to be used I would expect it to
be set in the cxl_patrol_scrub_context.
> + u8 scrub_cycle_hrs;
> + u8 min_scrub_cycle_hrs;
> +};
I do not understand the point of this extra object and would prefer to
keep intermediate data structures to a minimum.
It looks like all this does is provide for short lived parsed caching of
the raw hardware patrol scrube attributes. Just pass those raw objects
around and provide helpers to do the conversion.
The less data structures the less confusion for the next person that has
to read this code a few years down the road.
> +
> +enum cxl_scrub_param {
> + CXL_PS_PARAM_ENABLE,
> + CXL_PS_PARAM_SCRUB_CYCLE,
> +};
This seems unforuntate, why not make non-zero scrub rate an implied
enable and zero to disable? A non-zero sentinel value like U32_MAX can
indicate "keep scrub rate unchanged".
> +#define CXL_MEMDEV_PS_SCRUB_CYCLE_CHANGE_CAP_MASK BIT(0)
This CXL_MEMDEV_PS prefix is awkward due to overload with 'struct
cxl_memdev'. Minimize it to something like:
CXL_SCRUB_CONTROL_CHANGEABLE
CXL_SCRUB_CONTROL_REALTIME
CXL_SCRUB_CONTROL_CYCLE_MASK
CXL_SCRUB_CONTROL_MIN_CYCLE_MASK
> +#define CXL_MEMDEV_PS_SCRUB_CYCLE_REALTIME_REPORT_CAP_MASK BIT(1)
> +#define CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK GENMASK(7, 0)
> +#define CXL_MEMDEV_PS_MIN_SCRUB_CYCLE_MASK GENMASK(15, 8)
> +#define CXL_MEMDEV_PS_FLAG_ENABLED_MASK BIT(0)
CXL_SCRUB_CONTROL_ENABLE
...no need to call it a mask when it is just a single-bit, and when it
is both the status and the control just call it "enable".
> +
> +/*
> + * See CXL spec rev 3.2 @8.2.10.9.11.1 Table 8-222 Device Patrol Scrub Control
> + * Feature Readable Attributes.
> + */
> +struct cxl_memdev_ps_rd_attrs {
> + u8 scrub_cycle_cap;
> + __le16 scrub_cycle_hrs;
"hours" is just 2 more characters than "hrs", I think we can afford the
extra bytes.
[..]
> +int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd)
> +{
> + struct edac_dev_feature ras_features[CXL_NR_EDAC_DEV_FEATURES];
> + int num_ras_features = 0;
> + u8 scrub_inst = 0;
> + int rc;
> +
> + rc = cxl_memdev_scrub_init(cxlmd, &ras_features[num_ras_features],
> + scrub_inst);
> + if (rc < 0 && rc != -EOPNOTSUPP)
> + return rc;
> +
> + if (rc != -EOPNOTSUPP)
> + num_ras_features++;
> +
> + char *cxl_dev_name __free(kfree) =
> + kasprintf(GFP_KERNEL, "cxl_%s", dev_name(&cxlmd->dev));
if (!cxl_dev_name)
return -ENOMEM;
> +
> + return edac_dev_register(&cxlmd->dev, cxl_dev_name, NULL,
> + num_ras_features, ras_features);
> +}
> +EXPORT_SYMBOL_NS_GPL(devm_cxl_memdev_edac_register, "CXL");
> +
> +int devm_cxl_region_edac_register(struct cxl_region *cxlr)
> +{
> + struct edac_dev_feature ras_features[CXL_NR_EDAC_DEV_FEATURES];
> + int num_ras_features = 0;
> + u8 scrub_inst = 0;
> + int rc;
> +
> + rc = cxl_region_scrub_init(cxlr, &ras_features[num_ras_features],
> + scrub_inst);
> + if (rc < 0)
> + return rc;
> +
> + num_ras_features++;
> +
> + char *cxl_dev_name __free(kfree) =
> + kasprintf(GFP_KERNEL, "cxl_%s", dev_name(&cxlr->dev));
> +
> + return edac_dev_register(&cxlr->dev, cxl_dev_name, NULL,
> + num_ras_features, ras_features);
> +}
> +EXPORT_SYMBOL_NS_GPL(devm_cxl_region_edac_register, "CXL");
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index b3260d433ec7..2aa6eb675fdf 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -3542,6 +3542,11 @@ static int cxl_region_probe(struct device *dev)
> case CXL_PARTMODE_PMEM:
> return devm_cxl_add_pmem_region(cxlr);
> case CXL_PARTMODE_RAM:
> + rc = devm_cxl_region_edac_register(cxlr);
Why do only volatile regions get EDAC support? PMEM patrol scrub seems
equally valid.