Re: [PATCH -v2 10/17] sched: Fix migrate_disable() vs set_cpus_allowed_ptr()

From: Peter Zijlstra
Date: Mon Oct 12 2020 - 07:37:19 EST


On Fri, Oct 09, 2020 at 01:22:00AM +0800, Tao Zhou wrote:
> On Mon, Oct 05, 2020 at 04:57:27PM +0200, Peter Zijlstra wrote:
> > +/*
> > + * This function is wildly self concurrent, consider at least 3 times.
> > + */
>
> More than that

Probably. I meant to write a coherent comment, but it got very long and
not so very coherent.

It should probably enumerate all the various cases with diagrams like
those in this email:

https://lkml.kernel.org/r/jhj3637lzdm.mognet@xxxxxxx

So we have:

- set_affinity() vs migrate_disable()
- set_affinity() + N*set_affinity() vs migrate_disable()
(ie. N additional waiters)
- set_affinity() vs migrate_disable() vs set_affinity()
(ie. the case from the above email)

And possibly some others.

If you have cycles to spend on writing that comment, that'd be great,
otherwise it'll stay on the todo list for a little while longer.

> > +static int affine_move_task(struct rq *rq, struct rq_flags *rf,
> > + struct task_struct *p, int dest_cpu, unsigned int flags)
> > +{
> > + struct set_affinity_pending my_pending = { }, *pending = NULL;
> > + struct migration_arg arg = {
> > + .task = p,
> > + .dest_cpu = dest_cpu,
> > + };
> > + bool complete = false;
> > +
> > + /* Can the task run on the task's current CPU? If so, we're done */
> > + if (cpumask_test_cpu(task_cpu(p), &p->cpus_mask)) {
> > + pending = p->migration_pending;
> > + if (pending) {
> > + p->migration_pending = NULL;
> > + complete = true;
> > + }
> > + task_rq_unlock(rq, p, rf);
> > +
> > + if (complete)
> > + goto do_complete;
> > +
> > + return 0;
> > + }
> > +
> > + if (!(flags & SCA_MIGRATE_ENABLE)) {
> > + /* serialized by p->pi_lock */
> > + if (!p->migration_pending) {
> > + refcount_set(&my_pending.refs, 1);
> > + init_completion(&my_pending.done);
> > + p->migration_pending = &my_pending;
> > + } else {
> > + pending = p->migration_pending;
>
> The above load can be omited, no ?
>
> > + refcount_inc(&pending->refs);

That would put ^ in trouble...

> > + }
> > + }
> > + pending = p->migration_pending;