Re: [PATCH v14 07/14] cxl/memfeature: Add CXL memory device patrol scrub control feature

From: Dave Jiang
Date: Tue Oct 29 2024 - 14:33:05 EST




On 10/29/24 10:00 AM, Shiju Jose wrote:
>
>
>> -----Original Message-----
>> From: Dave Jiang <dave.jiang@xxxxxxxxx>
>> Sent: 29 October 2024 16:32
>> To: Shiju Jose <shiju.jose@xxxxxxxxxx>; linux-edac@xxxxxxxxxxxxxxx; linux-
>> cxl@xxxxxxxxxxxxxxx; linux-acpi@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; linux-
>> kernel@xxxxxxxxxxxxxxx
>> Cc: bp@xxxxxxxxx; tony.luck@xxxxxxxxx; rafael@xxxxxxxxxx; lenb@xxxxxxxxxx;
>> mchehab@xxxxxxxxxx; dan.j.williams@xxxxxxxxx; dave@xxxxxxxxxxxx; Jonathan
>> Cameron <jonathan.cameron@xxxxxxxxxx>; gregkh@xxxxxxxxxxxxxxxxxxx;
>> sudeep.holla@xxxxxxx; jassisinghbrar@xxxxxxxxx; alison.schofield@xxxxxxxxx;
>> vishal.l.verma@xxxxxxxxx; ira.weiny@xxxxxxxxx; david@xxxxxxxxxx;
>> Vilas.Sridharan@xxxxxxx; leo.duran@xxxxxxx; Yazen.Ghannam@xxxxxxx;
>> rientjes@xxxxxxxxxx; jiaqiyan@xxxxxxxxxx; Jon.Grimm@xxxxxxx;
>> dave.hansen@xxxxxxxxxxxxxxx; naoya.horiguchi@xxxxxxx;
>> james.morse@xxxxxxx; jthoughton@xxxxxxxxxx; somasundaram.a@xxxxxxx;
>> erdemaktas@xxxxxxxxxx; pgonda@xxxxxxxxxx; duenwen@xxxxxxxxxx;
>> gthelen@xxxxxxxxxx; wschwartz@xxxxxxxxxxxxxxxxxxx;
>> dferguson@xxxxxxxxxxxxxxxxxxx; wbs@xxxxxxxxxxxxxxxxxxxxxx;
>> nifan.cxl@xxxxxxxxx; tanxiaofei <tanxiaofei@xxxxxxxxxx>; Zengtao (B)
>> <prime.zeng@xxxxxxxxxxxxx>; Roberto Sassu <roberto.sassu@xxxxxxxxxx>;
>> kangkang.shen@xxxxxxxxxxxxx; wanghuiqiang <wanghuiqiang@xxxxxxxxxx>;
>> Linuxarm <linuxarm@xxxxxxxxxx>
>> Subject: Re: [PATCH v14 07/14] cxl/memfeature: Add CXL memory device patrol
>> scrub control feature
>>
>>
>>
>> On 10/25/24 10:13 AM, shiju.jose@xxxxxxxxxx wrote:
>>> From: Shiju Jose <shiju.jose@xxxxxxxxxx>
>>>
>>> CXL spec 3.1 section 8.2.9.9.11.1 describes the device patrol scrub
>>> control feature. The device patrol scrub proactively locates and makes
>>> corrections to errors in regular cycle.
>>>
>>> Allow specifying the number of hours within which the patrol scrub
>>> must be completed, subject to minimum and maximum limits reported by the
>> device.
>>> Also allow disabling scrub allowing trade-off error rates against
>>> performance.
>>>
>>> Add support for patrol scrub control on CXL memory devices.
>>> Register with the EDAC device driver, which retrieves the scrub
>>> attribute descriptors from EDAC scrub and exposes the sysfs scrub
>>> control attributes to userspace. For example, scrub control for the
>>> CXL memory device "cxl_mem0" is exposed in
>> /sys/bus/edac/devices/cxl_mem0/scrubX/.
>>>
>>> Additionally, add support for region-based CXL memory patrol scrub control.
>>> CXL memory regions may be interleaved across one or more CXL memory
>>> devices. For example, region-based scrub control for "cxl_region1" is
>>> exposed in /sys/bus/edac/devices/cxl_region1/scrubX/.
>>>
>>> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
>>> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
>>> Signed-off-by: Shiju Jose <shiju.jose@xxxxxxxxxx>
>>> ---
>>> Documentation/edac/edac-scrub.rst | 74 ++++++
>>> drivers/cxl/Kconfig | 18 ++
>>> drivers/cxl/core/Makefile | 1 +
>>> drivers/cxl/core/memfeature.c | 381 ++++++++++++++++++++++++++++++
>>> drivers/cxl/core/region.c | 6 +
>>> drivers/cxl/cxlmem.h | 7 +
>>> drivers/cxl/mem.c | 4 +
>>> 7 files changed, 491 insertions(+)
>>> create mode 100644 Documentation/edac/edac-scrub.rst create mode
>>> 100644 drivers/cxl/core/memfeature.c
>>>
>>> diff --git a/Documentation/edac/edac-scrub.rst
>>> b/Documentation/edac/edac-scrub.rst
>>> new file mode 100644
>>> index 000000000000..4aad4974b208
>>> --- /dev/null
>>> +++ b/Documentation/edac/edac-scrub.rst
>>> @@ -0,0 +1,74 @@
>>> +.. SPDX-License-Identifier: GPL-2.0
>>> +
> [...]
>
>>> +static int cxl_mem_ps_get_attrs(struct cxl_memdev_state *mds,
>>> + struct cxl_memdev_ps_params *params) {
>>> + size_t rd_data_size = sizeof(struct cxl_memdev_ps_rd_attrs);
>>> + size_t data_size;
>>> + struct cxl_memdev_ps_rd_attrs *rd_attrs __free(kfree) =
>>> + kmalloc(rd_data_size,
>> GFP_KERNEL);
>>> + if (!rd_attrs)
>>> + return -ENOMEM;
>>> +
>>> + data_size = cxl_get_feature(mds, cxl_patrol_scrub_uuid,
>>> + CXL_GET_FEAT_SEL_CURRENT_VALUE,
>>> + rd_attrs, rd_data_size);
>>> + if (!data_size)
>>> + return -EIO;
>>> +
>>> + params->scrub_cycle_changeable =
>> FIELD_GET(CXL_MEMDEV_PS_SCRUB_CYCLE_CHANGE_CAP_MASK,
>>> + rd_attrs->scrub_cycle_cap);
>>> + params->enable =
>> FIELD_GET(CXL_MEMDEV_PS_FLAG_ENABLED_MASK,
>>> + rd_attrs->scrub_flags);
>>> + params->scrub_cycle_hrs =
>> FIELD_GET(CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK,
>>> + rd_attrs->scrub_cycle_hrs);
>>> + params->min_scrub_cycle_hrs =
>> FIELD_GET(CXL_MEMDEV_PS_MIN_SCRUB_CYCLE_MASK,
>>> + rd_attrs->scrub_cycle_hrs);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static int cxl_ps_get_attrs(struct device *dev, void *drv_data,
>>
>> Would a union be better than a void *drv_data for all the places this is used as a
>> parameter? How many variations of this are there?
>>
>> DJ
> Hi Dave,
>
> Can you give more info on this given this is a generic callback for the scrub control and each
> implementation will have its own context struct (for eg. struct cxl_patrol_scrub_context here
> for CXL scrub control), which in turn will be passed in and out as opaque data.

Mainly I'm just seeing a lot of calls with (void *). Just asking if we want to make it a union that contains 'struct cxl_patrol_scrub_context' and etc.

>
> Thanks,
> Shiju
>>
>>> + struct cxl_memdev_ps_params *params) {
>>> + struct cxl_patrol_scrub_context *cxl_ps_ctx = drv_data;
>>> + struct cxl_memdev *cxlmd;
>>> + struct cxl_dev_state *cxlds;
>>> + struct cxl_memdev_state *mds;
>>> + u16 min_scrub_cycle = 0;
>>> + int i, ret;
>>> +
>>> + if (cxl_ps_ctx->cxlr) {
>>> + struct cxl_region *cxlr = cxl_ps_ctx->cxlr;
>>> + struct cxl_region_params *p = &cxlr->params;
>>> +
>>> + for (i = p->interleave_ways - 1; i >= 0; i--) {
>>> + struct cxl_endpoint_decoder *cxled = p->targets[i];
>>> +
>>> + cxlmd = cxled_to_memdev(cxled);
>>> + cxlds = cxlmd->cxlds;
>>> + mds = to_cxl_memdev_state(cxlds);
>>> + ret = cxl_mem_ps_get_attrs(mds, params);
>>> + if (ret)
>>> + return ret;
>>> +
>>> + if (params->min_scrub_cycle_hrs > min_scrub_cycle)
>>> + min_scrub_cycle = params-
>>> min_scrub_cycle_hrs;
>>> + }
>>> + params->min_scrub_cycle_hrs = min_scrub_cycle;
>>> + return 0;
>>> + }
>>> + cxlmd = cxl_ps_ctx->cxlmd;
>>> + cxlds = cxlmd->cxlds;
>>> + mds = to_cxl_memdev_state(cxlds);
>>> +
>>> + return cxl_mem_ps_get_attrs(mds, params); }
>>> +
> [...]
>>
>