Re: [PATCH v1] RAS/CEC: Memory Corrected Errors consistent event filtering
From: Borislav Petkov
Date: Fri Apr 02 2021 - 13:07:39 EST
On Fri, Apr 02, 2021 at 06:00:42PM +0200, William Roche wrote:
> Corrected Errors are not the best indicators for a failing DIMM
In the OS, errors reported through different mechanisms is all we have.
> For the moment we will have the CE MCE handled my the MCE_HANDLED_CEC
> aware notifiers only when a page is off-lined, like it used to be.
>
> Can we start with that small fix ?
Sure but do two variables pls - an "err" one which catches the
function's retval and a "ret" one which ce_add_elem() itself returns so
that there's no confusion like it was before:
---
diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index ddecf25b5dd4..b926c679cdaf 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -312,8 +312,8 @@ static bool sanity_check(struct ce_array *ca)
static int cec_add_elem(u64 pfn)
{
struct ce_array *ca = &ce_arr;
+ int count, err, ret = 0;
unsigned int to = 0;
- int count, ret = 0;
/*
* We can be called very early on the identify_cpu() path where we are
@@ -330,8 +330,8 @@ static int cec_add_elem(u64 pfn)
if (ca->n == MAX_ELEMS)
WARN_ON(!del_lru_elem_unlocked(ca));
- ret = find_elem(ca, pfn, &to);
- if (ret < 0) {
+ err = find_elem(ca, pfn, &to);
+ if (err < 0) {
/*
* Shift range [to-end] to make room for one more element.
*/
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette