Re: [PATCH 3/3] RAS/CEC: immediate soft-offline page when count_threshold == 1

From: WANG Chao
Date: Tue Apr 23 2019 - 22:50:31 EST


On 04/20/19 at 01:57P, Borislav Petkov wrote:
> On Thu, Apr 18, 2019 at 11:41:15AM +0800, WANG Chao wrote:
> > count_threshol == 1 isn't working as expected. CEC only does soft
> > offline the second time the same pfn is hit by a correctable error.
>
> So this?
>
> ---
> diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
> index b3c377ddf340..750a427e1a73 100644
> --- a/drivers/ras/cec.c
> +++ b/drivers/ras/cec.c
> @@ -333,6 +333,7 @@ int cec_add_elem(u64 pfn)
>
> mutex_lock(&ce_mutex);
>
> + /* Array full, free the LRU slot. */
> if (ca->n == MAX_ELEMS)
> WARN_ON(!del_lru_elem_unlocked(ca));
>
> @@ -346,14 +347,9 @@ int cec_add_elem(u64 pfn)
> (void *)&ca->array[to],
> (ca->n - to) * sizeof(u64));
>
> - ca->array[to] = (pfn << PAGE_SHIFT) |
> - (DECAY_MASK << COUNT_BITS) | 1;
> + ca->array[to] = (pfn << PAGE_SHIFT) | 1;
>
> ca->n++;
> -
> - ret = 0;
> -
> - goto decay;
> }
>
> count = COUNT(ca->array[to]);
> @@ -386,7 +382,6 @@ int cec_add_elem(u64 pfn)
> goto unlock;
> }
>
> -decay:
> ca->decay_count++;
>
> if (ca->decay_count >= CLEAN_ELEMS)

It looks good to me. Thanks for a better fix.