Re: [PATCH 1/2] sched: Fix balance_callback()

From: Valentin Schneider
Date: Fri Sep 11 2020 - 08:22:57 EST



On 11/09/20 09:17, Peter Zijlstra wrote:
> The intent of balance_callback() has always been to delay executing
> balancing operations until the end of the current rq->lock section.
> This is because balance operations must often drop rq->lock, and that
> isn't safe in general.
>
> However, as noted by Scott, there were a few holes in that scheme;
> balance_callback() was called after rq->lock was dropped, which means
> another CPU can interleave and touch the callback list.
>

So that can be say __schedule() tail racing with some setprio; what's the
worst that can (currently) happen here? Something like say two consecutive
enqueuing of push_rt_tasks() to the callback list?

> Rework code to call the balance callbacks before dropping rq->lock
> where possible, and otherwise splice the balance list onto a local
> stack.
>
> This guarantees that the balance list must be empty when we take
> rq->lock. IOW, we'll only ever run our own balance callbacks.
>

Makes sense to me.

Reviewed-by: Valentin Schneider <valentin.schneider@xxxxxxx>

> Reported-by: Scott Wood <swood@xxxxxxxxxx>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>

[...]

> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1220,6 +1220,8 @@ static inline void rq_pin_lock(struct rq
> #ifdef CONFIG_SCHED_DEBUG
> rq->clock_update_flags &= (RQCF_REQ_SKIP|RQCF_ACT_SKIP);
> rf->clock_update_flags = 0;
> +
> + SCHED_WARN_ON(rq->balance_callback);

Clever!

> #endif
> }
>