Re: [RFC][PATCH 1/7] sched: Fix smp_call_function_single_async() usage for ILB

From: Valentin Schneider
Date: Fri May 29 2020 - 11:26:15 EST



On 26/05/20 17:10, Peter Zijlstra wrote:
> The recent commit: 90b5363acd47 ("sched: Clean up scheduler_ipi()")
> got smp_call_function_single_async() subtly wrong. Even though it will
> return -EBUSY when trying to re-use a csd, that condition is not
> atomic and still requires external serialization.
>
> The change in kick_ilb() got this wrong.
>
> While on first reading kick_ilb() has an atomic test-and-set that
> appears to serialize the use, the matching 'release' is not in the
> right place to actually guarantee this serialization.
>

I got confused the first time I read through that, and had the same
thoughts as Vincent; reading Frederic's reply and the thread
helped. Could we mention the tick softirq vs ilb kick race
thing in the changelog?

Otherwise the change looks okay, I couldn't convince myself there were
more gaps left to fill.

Reviewed-by: Valentin Schneider <valentin.schneider@xxxxxxx>

> Rework the nohz_idle_balance() trigger so that the release is in the
> IPI callback and thus guarantees the required serialization for the
> CSD.
>
> Fixes: 90b5363acd47 ("sched: Clean up scheduler_ipi()")
> Reported-by: Qian Cai <cai@xxxxxx>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>