Re: [PATCH 17/22] sched/core: fix opencoded cpumask_any_but()

From: Peter Zijlstra
Date: Tue May 10 2022 - 12:38:16 EST


On Tue, May 10, 2022 at 08:47:45AM -0700, Yury Norov wrote:
> sched_core_cpu_starting() and sched_core_cpu_deactivate() implement
> opencoded cpumask_any_but(). Fix it.
>
> CC: Ben Segall <bsegall@xxxxxxxxxx>
> CC: Daniel Bristot de Oliveira <bristot@xxxxxxxxxx>
> CC: Dietmar Eggemann <dietmar.eggemann@xxxxxxx>
> CC: Ingo Molnar <mingo@xxxxxxxxxx>
> CC: Juri Lelli <juri.lelli@xxxxxxxxxx>
> CC: Mel Gorman <mgorman@xxxxxxx>
> CC: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> CC: Steven Rostedt <rostedt@xxxxxxxxxxx>
> CC: Valentin Schneider <vschneid@xxxxxxxxxx>
> CC: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> CC: linux-kernel@xxxxxxxxxxxxxxx
> Signed-off-by: Yury Norov <yury.norov@xxxxxxxxx>
> ---
> kernel/sched/core.c | 33 +++++++++++++--------------------
> 1 file changed, 13 insertions(+), 20 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f5ebc392493d..9700001948d0 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6125,7 +6125,7 @@ static void queue_core_balance(struct rq *rq)
> static void sched_core_cpu_starting(unsigned int cpu)
> {
> const struct cpumask *smt_mask = cpu_smt_mask(cpu);
> - struct rq *rq = cpu_rq(cpu), *core_rq = NULL;
> + struct rq *rq = cpu_rq(cpu), *core_rq;
> unsigned long flags;
> int t;
>
> @@ -6138,19 +6138,16 @@ static void sched_core_cpu_starting(unsigned int cpu)
> goto unlock;
>
> /* find the leader */
> - for_each_cpu(t, smt_mask) {
> - if (t == cpu)
> - continue;
> - rq = cpu_rq(t);
> - if (rq->core == rq) {
> - core_rq = rq;
> - break;
> - }
> - }
> + t = cpumask_any_but(smt_mask, cpu);
> + if (t >= nr_cpu_ids)
> + goto unlock;
>
> - if (WARN_ON_ONCE(!core_rq)) /* whoopsie */
> + rq = cpu_rq(t);
> + if (WARN_ON_ONCE(rq->core != rq)) /* whoopsie */
> goto unlock;
>
> + core_rq = rq;
> +
> /* install and validate core_rq */
> for_each_cpu(t, smt_mask) {
> rq = cpu_rq(t);

I don't think this is equivalent. Imagine SMT4, with:

rqN->core_rq = rq0

Now, further suppose smt0-2 are online and we're about to online smt3.
Then t above is free to be smt2, which then results in insta triggering:

+ if (WARN_ON_ONCE(rq->core != rq)) /* whoopsie */

You seem to have lost how the first loop searches for rq->core.

Please, be more careful. Also, all of this is super cold path don't
bother with optimizations. Much of the patches you have in this series
fall under that.