Re: [PATCH v7 01/22] sched: Favour predetermined active CPU as migration destination

From: Peter Zijlstra
Date: Wed May 26 2021 - 08:33:23 EST


On Wed, May 26, 2021 at 12:14:20PM +0100, Valentin Schneider wrote:
> On 25/05/21 16:14, Will Deacon wrote:

> > @@ -1956,12 +1958,8 @@ static int migration_cpu_stop(void *data)
> > complete = true;
> > }
> >
> > - if (dest_cpu < 0) {
> > - if (cpumask_test_cpu(task_cpu(p), &p->cpus_mask))
> > - goto out;
> > -
> > - dest_cpu = cpumask_any_distribute(&p->cpus_mask);
> > - }
> > + if (dest_mask && (cpumask_test_cpu(task_cpu(p), dest_mask)))
> > + goto out;
> >
>
> IIRC the reason we deferred the pick to migration_cpu_stop() was because of
> those insane races involving multiple SCA calls the likes of:
>
> p->cpus_mask = [0, 1]; p on CPU0
>
> CPUx CPUy CPU0
>
> SCA(p, [2])
> __do_set_cpus_allowed();
> queue migration_cpu_stop()
> SCA(p, [3])
> __do_set_cpus_allowed();
> migration_cpu_stop()
>
> The stopper needs to use the latest cpumask set by the second SCA despite
> having an arg->pending set up by the first SCA. Doesn't this break here?

Yep.

> I'm not sure I've paged back in all of the subtleties laying in ambush
> here, but what about the below?
>
> ---
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 5226cc26a095..cd447c9db61d 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c

> @@ -1954,19 +1953,15 @@ static int migration_cpu_stop(void *data)
> if (pending) {
> p->migration_pending = NULL;
> complete = true;
>
> if (cpumask_test_cpu(task_cpu(p), &p->cpus_mask))
> goto out;
> }
>
> if (task_on_rq_queued(p))
> + rq = __migrate_task(rq, &rf, p, arg->dest_cpu);

> @@ -2249,7 +2244,7 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
> init_completion(&my_pending.done);
> my_pending.arg = (struct migration_arg) {
> .task = p,
> + .dest_cpu = dest_cpu,
> .pending = &my_pending,
> };
>
> @@ -2257,6 +2252,7 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
> } else {
> pending = p->migration_pending;
> refcount_inc(&pending->refs);
> + pending->arg.dest_cpu = dest_cpu;
> }
> }

Argh.. that might just work. But I'm thinking we wants comments this
time around :-) This is even more subtle.