RE: [PATCH v17 00/18] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers

From: Shiju Jose
Date: Fri Jan 03 2025 - 13:33:05 EST


>-----Original Message-----
>From: Dave Jiang <dave.jiang@xxxxxxxxx>
>Sent: 03 January 2025 15:50
>To: Jonathan Cameron <jonathan.cameron@xxxxxxxxxx>; Borislav Petkov
><bp@xxxxxxxxx>
>Cc: Shiju Jose <shiju.jose@xxxxxxxxxx>; linux-edac@xxxxxxxxxxxxxxx; linux-
>cxl@xxxxxxxxxxxxxxx; linux-acpi@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; linux-
>kernel@xxxxxxxxxxxxxxx; tony.luck@xxxxxxxxx; rafael@xxxxxxxxxx;
>lenb@xxxxxxxxxx; mchehab@xxxxxxxxxx; dan.j.williams@xxxxxxxxx;
>dave@xxxxxxxxxxxx; alison.schofield@xxxxxxxxx; vishal.l.verma@xxxxxxxxx;
>ira.weiny@xxxxxxxxx; david@xxxxxxxxxx; Vilas.Sridharan@xxxxxxx;
>leo.duran@xxxxxxx; Yazen.Ghannam@xxxxxxx; rientjes@xxxxxxxxxx;
>jiaqiyan@xxxxxxxxxx; Jon.Grimm@xxxxxxx; dave.hansen@xxxxxxxxxxxxxxx;
>naoya.horiguchi@xxxxxxx; james.morse@xxxxxxx; jthoughton@xxxxxxxxxx;
>somasundaram.a@xxxxxxx; erdemaktas@xxxxxxxxxx; pgonda@xxxxxxxxxx;
>duenwen@xxxxxxxxxx; gthelen@xxxxxxxxxx;
>wschwartz@xxxxxxxxxxxxxxxxxxx; dferguson@xxxxxxxxxxxxxxxxxxx;
>wbs@xxxxxxxxxxxxxxxxxxxxxx; nifan.cxl@xxxxxxxxx; tanxiaofei
><tanxiaofei@xxxxxxxxxx>; Zengtao (B) <prime.zeng@xxxxxxxxxxxxx>; Roberto
>Sassu <roberto.sassu@xxxxxxxxxx>; kangkang.shen@xxxxxxxxxxxxx;
>wanghuiqiang <wanghuiqiang@xxxxxxxxxx>; Linuxarm
><linuxarm@xxxxxxxxxx>
>Subject: Re: [PATCH v17 00/18] EDAC: Scrub: introduce generic EDAC RAS
>control feature driver + CXL/ACPI-RAS2 drivers
>
>
>
>On 1/3/25 6:02 AM, Jonathan Cameron wrote:
>> On Fri, 3 Jan 2025 12:41:45 +0100
>> Borislav Petkov <bp@xxxxxxxxx> wrote:
>>
>>> On Fri, Nov 22, 2024 at 06:03:57PM +0000, shiju.jose@xxxxxxxxxx wrote:
>>>> drivers/edac/Makefile | 1 +
>>>> drivers/edac/ecs.c | 207 +++
>>>> drivers/edac/edac_device.c | 183 ++
>>>> drivers/edac/mem_repair.c | 492 +++++
>>>> drivers/edac/scrub.c | 209 +++
>>>> drivers/ras/Kconfig | 10 +
>>>> drivers/ras/Makefile | 1 +
>>>> drivers/ras/acpi_ras2.c | 385 ++++
>>>> include/acpi/ras2_acpi.h | 45 +
>>>> include/cxl/features.h | 48 +
>>>> include/cxl/mailbox.h | 45 +-
>>>> include/linux/edac.h | 238 +++
>>>> include/uapi/linux/cxl_mem.h | 3 +
>>>
>>> So what's the plan here? Am I supposed to merge the EDAC/RAS bits
>>> through the RAS tree and then give folks an immutable branch or how
>>> do we want to proceed here?
>>>
>>
>> Dave Jiang / Rafael, what would work best for the two of you?
>>
>> To me Boris' suggestion makes sense, particularly as that avoids the
>> complexity of CXL get/set features being in multiple series.
>>
>> I think the split that would make sense is:
>>
>> EDAC immutable branch for:
>> 1: EDAC: Add support for EDAC device features control
>> 2: Add scrub control feature
>> 3: EDAC: Add ECS control feature
>> 15: EDAC: Add memory repair control feature
>>
>> ACPI merges EDAC immutable +
>> 13: ACPI:RAS2: Add ACPI RAS2 driver
>> 14: ras: mem: Add memory ACPI RAS2 driver
>>
>> CXL merges EDAC immutable +
>> 4: cxl: Refactor user ioctl command path from mds to mailbox
>> 5: cxl: Add Get Supported Features command for kernel usage
>> 6: cxl/mbox: Add GET_FEATURE mailbox command
>> 7: cxl: Add Get Feature command support for user submission
>> 8: cxl/mbox: Add SET_FEATURE mailbox command
>> 9: cxl: Add Set Feature command support for user submission
>> 10: cxl: Add UUIDs for the CXL RAS features
>> 11: cxl/memfeature: Add CXL memory device patrol scrub control
>> feature
>> 12: cxl/memfeature: Add CXL memory device ECS control feature
>> 16: cxl/mbox: Add support for PERFORM_MAINTENANCE mailbox command
>> 17: cxl/memfeature: Add CXL memory device soft PPR control feature
>> 18: cxl/memfeature: Add CXL memory device memory sparing control
>> feature
>
>That works for me.
>
>DJ
>
>>
>> That does mean that the actual drivers/edac/ specific drivers land via
>> the ACPI and CXL trees only, but without another layer of immutable
>> branches we can't avoid that. Might cause merge conflicts in
>> Kconfig/Makefiles but otherwise shouldn't be too bad.
>>
>> There is going to be some noise in documentation as examples are added
>> to the docs with the actual drivers (whereas generic docs are
>> introduced with the infrastructure). I think that will work out though.
>> Shiju, could you spin this ordering up and check it all works
>> (incorporating Dave's updates to the GET / SET feature)?

Rebased, reordered and tested fine. Waiting for some information before
sharing the updated patches.

Thanks,
Shiju

>> > Thanks,
>>
>> Jonathan
>