RE: [PATCH v18 04/19] EDAC: Add memory repair control feature

From: Shiju Jose
Date: Tue Jan 14 2025 - 09:31:33 EST


>-----Original Message-----
>From: Mauro Carvalho Chehab <mchehab+huawei@xxxxxxxxxx>
>Sent: 14 January 2025 13:47
>To: Shiju Jose <shiju.jose@xxxxxxxxxx>
>Cc: linux-edac@xxxxxxxxxxxxxxx; linux-cxl@xxxxxxxxxxxxxxx; linux-
>acpi@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
>bp@xxxxxxxxx; tony.luck@xxxxxxxxx; rafael@xxxxxxxxxx; lenb@xxxxxxxxxx;
>mchehab@xxxxxxxxxx; dan.j.williams@xxxxxxxxx; dave@xxxxxxxxxxxx; Jonathan
>Cameron <jonathan.cameron@xxxxxxxxxx>; dave.jiang@xxxxxxxxx;
>alison.schofield@xxxxxxxxx; vishal.l.verma@xxxxxxxxx; ira.weiny@xxxxxxxxx;
>david@xxxxxxxxxx; Vilas.Sridharan@xxxxxxx; leo.duran@xxxxxxx;
>Yazen.Ghannam@xxxxxxx; rientjes@xxxxxxxxxx; jiaqiyan@xxxxxxxxxx;
>Jon.Grimm@xxxxxxx; dave.hansen@xxxxxxxxxxxxxxx;
>naoya.horiguchi@xxxxxxx; james.morse@xxxxxxx; jthoughton@xxxxxxxxxx;
>somasundaram.a@xxxxxxx; erdemaktas@xxxxxxxxxx; pgonda@xxxxxxxxxx;
>duenwen@xxxxxxxxxx; gthelen@xxxxxxxxxx;
>wschwartz@xxxxxxxxxxxxxxxxxxx; dferguson@xxxxxxxxxxxxxxxxxxx;
>wbs@xxxxxxxxxxxxxxxxxxxxxx; nifan.cxl@xxxxxxxxx; tanxiaofei
><tanxiaofei@xxxxxxxxxx>; Zengtao (B) <prime.zeng@xxxxxxxxxxxxx>; Roberto
>Sassu <roberto.sassu@xxxxxxxxxx>; kangkang.shen@xxxxxxxxxxxxx;
>wanghuiqiang <wanghuiqiang@xxxxxxxxxx>; Linuxarm
><linuxarm@xxxxxxxxxx>
>Subject: Re: [PATCH v18 04/19] EDAC: Add memory repair control feature
>
>Em Mon, 6 Jan 2025 12:10:00 +0000
><shiju.jose@xxxxxxxxxx> escreveu:
>
>> +What: /sys/bus/edac/devices/<dev-
>name>/mem_repairX/repair_function
>> +Date: Jan 2025
>> +KernelVersion: 6.14
>> +Contact: linux-edac@xxxxxxxxxxxxxxx
>> +Description:
>> + (RO) Memory repair function type. For eg. post package repair,
>> + memory sparing etc.
>> + EDAC_SOFT_PPR - Soft post package repair
>> + EDAC_HARD_PPR - Hard post package repair
>> + EDAC_CACHELINE_MEM_SPARING - Cacheline memory sparing
>> + EDAC_ROW_MEM_SPARING - Row memory sparing
>> + EDAC_BANK_MEM_SPARING - Bank memory sparing
>> + EDAC_RANK_MEM_SPARING - Rank memory sparing
>> + All other values are reserved.
>> +
>> +What: /sys/bus/edac/devices/<dev-
>name>/mem_repairX/persist_mode
>> +Date: Jan 2025
>> +KernelVersion: 6.14
>> +Contact: linux-edac@xxxxxxxxxxxxxxx
>> +Description:
>> + (RW) Read/Write the current persist repair mode set for a
>> + repair function. Persist repair modes supported in the
>> + device, based on the memory repair function is temporary
>> + or permanent and is lost with a power cycle.
>> + EDAC_MEM_REPAIR_SOFT - Soft repair function (temporary
>repair).
>> + EDAC_MEM_REPAIR_HARD - Hard memory repair function
>(permanent repair).
>> + All other values are reserved.
>> +
>
>After re-reading some things, I suspect that the above can be simplified a little
>bit by folding soft/hard PPR into a single element at /repair_function, and letting
>it clearer that persist_mode is valid only for PPR (I think this is the case, right?),
>e.g. something like:
persist_mode is valid for memory sparing features(atleast in CXL) as well.
In the case of CXL memory sparing, host has option to request either soft or hard sparing
in a flag when issue a memory sparing operation.

>
> What: /sys/bus/edac/devices/<dev-
>name>/mem_repairX/repair_function
> ...
> Description:
> (RO) Memory repair function type. For e.g. post
>package repair,
> memory sparing etc. Valid values are:
>
> - ppr - post package repair.
> Please define its mode via
> /sys/bus/edac/devices/<dev-
>name>/mem_repairX/persist_mode
> - cacheline-sparing - Cacheline memory sparing
> - row-sparing - Row memory sparing
> - bank-sparing - Bank memory sparing
> - rank-sparing - Rank memory sparing
> - All other values are reserved.
>
>and define persist_mode in a different way:
Note: For return as decoded strings instead of raw value, I need to add some extra callback function/s
in the edac/memory_repair.c for these attributes and which will reduce the current level of optimization done to
minimize the code size.
>
> What: /sys/bus/edac/devices/<dev-
>name>/mem_repairX/ppr_persist_mode
Same as above. persist_mode is needed for memory sparing feature too.
> ...
> Description:
> (RW) Read/Write the current persist repair (PPR) mode set for a
> post package repair function. Persist repair modes supported
> in the device, based on the memory repair function is
>temporary
> or permanent and is lost with a power cycle. Valid values are:
>
> - repair-soft - Soft PPR function (temporary repair).
> - repair-hard - Hard memory repair function (permanent
>repair).
> - All other values are reserved.
>
>Thanks,
>Mauro

Thanks,
Shiju