Re: [patch 00/13] sched/treewide: Clean up various racy task affinity issues

From: Peter Zijlstra
Date: Thu Apr 13 2017 - 05:02:44 EST


On Wed, Apr 12, 2017 at 10:07:26PM +0200, Thomas Gleixner wrote:
> While dealing with the fallout of the scheduler cleanups on RT, we found
> several racy usage sites of the following scheme:
>
> cpumask_copy(&save_cpus_allowed, &current->cpus_allowed);
> set_cpus_allowed_ptr(current, cpumask_of(cpu));
> do_stuff();
> set_cpus_allowed_ptr(current, &save_cpus_allowed);
>
> That's racy in two aspects:
>
> 1) Nothing prevents the CPU from being unplugged after the temporary
> affinity setting is in place. This results on code being executed on the
> wrong CPU(s).
>
> 2) Nothing prevents a concurrent affinity setting from user space. That
> also results in code being executed on the wrong CPU(s) and the restore
> of the previous affinity setting overwrites the new one.
>
> Various variants of cleanups:
>
> - Removal, because the calling thread is already guaranteed to run on the
> correct CPU.
>
> - Conversion to smp function calls (simple register read/write)
>
> - Conversion to work_on_cpu(). There were even files containing comments
> to that effect.
>
> - The rest needs seperate hotplug protection for work_on_cpu(). To avoid open
> coding the
>
> get_online_cpus();
> if (cpu_online(cpu))
> ret = do_stuff();
> else
> ret = -ENODEV;
> put_online_cpus();
>
> scheme this series provides a new helper function work_on_cpu_safe()
> which implements the above.
>
> Aside of fixing these races this allows to restrict the access to
> current->cpus_allowed with a follow up series.

Looks good, thanks for tackling these!