Re: [PATCH 2/2] ras: close the race condition with timer

From: Borislav Petkov
Date: Tue Apr 16 2019 - 05:58:56 EST


On Mon, Apr 15, 2019 at 06:20:01PM -0700, Cong Wang wrote:
> cec_timer_fn() is a timer callback which reads ce_arr.array[]
> and updates its decay values. Elements could be added to or
> removed from this global array in parallel, although the array
> itself will not grow or shrink. del_lru_elem_unlocked() uses
> FULL_COUNT() as a key to find a right element to remove,
> which could be affected by the parallel timer.
>
> Fix this by converting the mutex to a spinlock and holding it
> inside the timer.
>
> Fixes: 011d82611172 ("RAS: Add a Corrected Errors Collector")
> Cc: Tony Luck <tony.luck@xxxxxxxxx>
> Cc: Borislav Petkov <bp@xxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Signed-off-by: Cong Wang <xiyou.wangcong@xxxxxxxxx>
> ---
> drivers/ras/cec.c | 16 +++++++++-------
> 1 file changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
> index 61332c9aab5a..a82c9d08d47a 100644
> --- a/drivers/ras/cec.c
> +++ b/drivers/ras/cec.c
> @@ -117,7 +117,7 @@ static struct ce_array {
> };
> } ce_arr;
>
> -static DEFINE_MUTEX(ce_mutex);
> +static DEFINE_SPINLOCK(ce_lock);

Nah, pls keep the simplicity here by retaining the mutex. Use a
workqueue instead to queue the spring cleaning from IRQ context and then
do it when back in preemptible context.

Thx.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.