Re: [PATCH v18 04/19] EDAC: Add memory repair control feature

From: Mauro Carvalho Chehab
Date: Wed Jan 15 2025 - 07:04:23 EST


Em Tue, 14 Jan 2025 14:30:53 +0000
Shiju Jose <shiju.jose@xxxxxxxxxx> escreveu:

> >-----Original Message-----
> >From: Mauro Carvalho Chehab <mchehab+huawei@xxxxxxxxxx>
> >Sent: 14 January 2025 13:47
> >To: Shiju Jose <shiju.jose@xxxxxxxxxx>
> >Cc: linux-edac@xxxxxxxxxxxxxxx; linux-cxl@xxxxxxxxxxxxxxx; linux-
> >acpi@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> >bp@xxxxxxxxx; tony.luck@xxxxxxxxx; rafael@xxxxxxxxxx; lenb@xxxxxxxxxx;
> >mchehab@xxxxxxxxxx; dan.j.williams@xxxxxxxxx; dave@xxxxxxxxxxxx; Jonathan
> >Cameron <jonathan.cameron@xxxxxxxxxx>; dave.jiang@xxxxxxxxx;
> >alison.schofield@xxxxxxxxx; vishal.l.verma@xxxxxxxxx; ira.weiny@xxxxxxxxx;
> >david@xxxxxxxxxx; Vilas.Sridharan@xxxxxxx; leo.duran@xxxxxxx;
> >Yazen.Ghannam@xxxxxxx; rientjes@xxxxxxxxxx; jiaqiyan@xxxxxxxxxx;
> >Jon.Grimm@xxxxxxx; dave.hansen@xxxxxxxxxxxxxxx;
> >naoya.horiguchi@xxxxxxx; james.morse@xxxxxxx; jthoughton@xxxxxxxxxx;
> >somasundaram.a@xxxxxxx; erdemaktas@xxxxxxxxxx; pgonda@xxxxxxxxxx;
> >duenwen@xxxxxxxxxx; gthelen@xxxxxxxxxx;
> >wschwartz@xxxxxxxxxxxxxxxxxxx; dferguson@xxxxxxxxxxxxxxxxxxx;
> >wbs@xxxxxxxxxxxxxxxxxxxxxx; nifan.cxl@xxxxxxxxx; tanxiaofei
> ><tanxiaofei@xxxxxxxxxx>; Zengtao (B) <prime.zeng@xxxxxxxxxxxxx>; Roberto
> >Sassu <roberto.sassu@xxxxxxxxxx>; kangkang.shen@xxxxxxxxxxxxx;
> >wanghuiqiang <wanghuiqiang@xxxxxxxxxx>; Linuxarm
> ><linuxarm@xxxxxxxxxx>
> >Subject: Re: [PATCH v18 04/19] EDAC: Add memory repair control feature
> >
> >Em Mon, 6 Jan 2025 12:10:00 +0000
> ><shiju.jose@xxxxxxxxxx> escreveu:
> >
> >> +What: /sys/bus/edac/devices/<dev-
> >name>/mem_repairX/repair_function
> >> +Date: Jan 2025
> >> +KernelVersion: 6.14
> >> +Contact: linux-edac@xxxxxxxxxxxxxxx
> >> +Description:
> >> + (RO) Memory repair function type. For eg. post package repair,
> >> + memory sparing etc.
> >> + EDAC_SOFT_PPR - Soft post package repair
> >> + EDAC_HARD_PPR - Hard post package repair
> >> + EDAC_CACHELINE_MEM_SPARING - Cacheline memory sparing
> >> + EDAC_ROW_MEM_SPARING - Row memory sparing
> >> + EDAC_BANK_MEM_SPARING - Bank memory sparing
> >> + EDAC_RANK_MEM_SPARING - Rank memory sparing
> >> + All other values are reserved.
> >> +
> >> +What: /sys/bus/edac/devices/<dev-
> >name>/mem_repairX/persist_mode
> >> +Date: Jan 2025
> >> +KernelVersion: 6.14
> >> +Contact: linux-edac@xxxxxxxxxxxxxxx
> >> +Description:
> >> + (RW) Read/Write the current persist repair mode set for a
> >> + repair function. Persist repair modes supported in the
> >> + device, based on the memory repair function is temporary
> >> + or permanent and is lost with a power cycle.
> >> + EDAC_MEM_REPAIR_SOFT - Soft repair function (temporary
> >repair).
> >> + EDAC_MEM_REPAIR_HARD - Hard memory repair function
> >(permanent repair).
> >> + All other values are reserved.
> >> +
> >
> >After re-reading some things, I suspect that the above can be simplified a little
> >bit by folding soft/hard PPR into a single element at /repair_function, and letting
> >it clearer that persist_mode is valid only for PPR (I think this is the case, right?),
> >e.g. something like:
> persist_mode is valid for memory sparing features(atleast in CXL) as well.
> In the case of CXL memory sparing, host has option to request either soft or hard sparing
> in a flag when issue a memory sparing operation.

Ok.

>
> >
> > What: /sys/bus/edac/devices/<dev-
> >name>/mem_repairX/repair_function
> > ...
> > Description:
> > (RO) Memory repair function type. For e.g. post
> >package repair,
> > memory sparing etc. Valid values are:
> >
> > - ppr - post package repair.
> > Please define its mode via
> > /sys/bus/edac/devices/<dev-
> >name>/mem_repairX/persist_mode
> > - cacheline-sparing - Cacheline memory sparing
> > - row-sparing - Row memory sparing
> > - bank-sparing - Bank memory sparing
> > - rank-sparing - Rank memory sparing
> > - All other values are reserved.
> >
> >and define persist_mode in a different way:
> Note: For return as decoded strings instead of raw value, I need to add some extra callback function/s
> in the edac/memory_repair.c for these attributes and which will reduce the current level of optimization done to
> minimize the code size.

You're already using a callback at EDAC_MEM_REPAIR_ATTR_SHOW macro.
So, no need for any change at the current code, except for the type
used at the EDAC_MEM_REPAIR_ATTR_SHOW() call.

Something similar to this (not tested) would work:

int get_repair_function(struct device *dev, void *drv_data, const char **val)
{
unsigned int type;

// Some logic to get repair type from *drv_data, storing into "unsigned int type"

const char *repair_type[] = {
[EDAC_SOFT_PPR] = "ppr",
[EDAC_HARD_PPR] = "ppr",
[EDAC_CACHELINE_MEM_SPARING] = "cacheline-sparing",
...
}

if (type < ARRAY_SIZE(repair_type)) {
*val = repair_type(type);
return 0;
}

return -EINVAL;
}

EDAC_MEM_REPAIR_ATTR_SHOW(repair_function, get_repair_function, const char *, "%s\n");

Thanks,
Mauro