Re: [RFC][PATCH 1/4] sched: Fix a race between __kthread_bind() and sched_setaffinity()

From: Peter Zijlstra
Date: Fri Aug 07 2015 - 10:27:30 EST


On Fri, May 15, 2015 at 11:56:53AM -0400, Tejun Heo wrote:
> On Fri, May 15, 2015 at 05:43:34PM +0200, Peter Zijlstra wrote:
> > Because sched_setscheduler() checks p->flags & PF_NO_SETAFFINITY
> > without locks, a caller might observe an old value and race with the
> > set_cpus_allowed_ptr() call from __kthread_bind() and effectively undo
> > it.
> >
> > __kthread_bind()
> > do_set_cpus_allowed()
> > <SYSCALL>
> > sched_setaffinity()
> > if (p->flags & PF_NO_SETAFFINITIY)
> > set_cpus_allowed_ptr()
> > p->flags |= PF_NO_SETAFFINITY
> >
> > Fix the issue by putting everything under the regular scheduler locks.
> >
> > This also closes a hole in the serialization of
> > task_struct::{nr_,}cpus_allowed.
> >
> > Cc: Tejun Heo <tj@xxxxxxxxxx>
> > Cc: Oleg Nesterov <oleg@xxxxxxxxxx>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
>
> For workqueue part,
>
> Acked-by: Tejun Heo <tj@xxxxxxxxxx>

Sorry be being very late on this, got sidetracked with other bits.

This threw up a warning on testing:

[ 2.443944] WARNING: CPU: 0 PID: 10 at kernel/kthread.c:333 __kthread_bind_mask+0x34/0x6e()
[ 2.446978] Modules linked in:
[ 2.448359] CPU: 0 PID: 10 Comm: khelper Not tainted 4.1.0-rc6-00314-g6455666 #4
[ 2.450990] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 2.454132] 0000000000000009 ffff88000f643d68 ffffffff81a3df14 0000000000000b02
[ 2.470295] 0000000000000000 ffff88000f643da8 ffffffff810f308f 000000000f643da8
[ 2.503291] ffffffff8110d116 ffff88000f55d580 ffff88000f5240c0 ffff88000f4936e0
[ 2.506510] Call Trace:
[ 2.520770] [<ffffffff81a3df14>] dump_stack+0x4c/0x65
[ 2.522479] [<ffffffff810f308f>] warn_slowpath_common+0xa1/0xbb
[ 2.524334] [<ffffffff8110d116>] ? __kthread_bind_mask+0x34/0x6e
[ 2.526219] [<ffffffff810f314c>] warn_slowpath_null+0x1a/0x1c
[ 2.528069] [<ffffffff8110d116>] __kthread_bind_mask+0x34/0x6e
[ 2.529925] [<ffffffff8110d381>] kthread_bind_mask+0x13/0x15
[ 2.531738] [<ffffffff8110679d>] worker_attach_to_pool+0x39/0x7c
[ 2.546650] [<ffffffff8110866b>] rescuer_thread+0x130/0x318
[ 2.548484] [<ffffffff8110853b>] ? cancel_delayed_work_sync+0x15/0x15
[ 2.550411] [<ffffffff8110853b>] ? cancel_delayed_work_sync+0x15/0x15
[ 2.552207] [<ffffffff8110cd0f>] kthread+0xf8/0x100
[ 2.553864] [<ffffffff8110cc17>] ? kthread_create_on_node+0x184/0x184
[ 2.555795] [<ffffffff81a457c2>] ret_from_fork+0x42/0x70
[ 2.557538] [<ffffffff8110cc17>] ? kthread_create_on_node+0x184/0x184
[ 2.572520] ---[ end trace 362b92c9255ab666 ]---

Which is the rescue thread attaching itself to a pool that needs help,
and obviously the rescue thread isn't new so kthread_bind doesn't work
right.

The best I could come up with is something like the below on top; does
that work for you? I'll go give it some runtime.

--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1622,11 +1622,15 @@ static struct worker *alloc_worker(int n
* cpu-[un]hotplugs.
*/
static void worker_attach_to_pool(struct worker *worker,
- struct worker_pool *pool)
+ struct worker_pool *pool,
+ bool new)
{
mutex_lock(&pool->attach_mutex);

- kthread_bind_mask(worker->task, pool->attrs->cpumask);
+ if (new)
+ kthread_bind_mask(worker->task, pool->attrs->cpumask);
+ else
+ set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);

/*
* The pool->attach_mutex ensures %POOL_DISASSOCIATED remains
@@ -1712,7 +1716,7 @@ static struct worker *create_worker(stru
set_user_nice(worker->task, pool->attrs->nice);

/* successful, attach the worker to the pool */
- worker_attach_to_pool(worker, pool);
+ worker_attach_to_pool(worker, pool, true);

/* start the newly created worker */
spin_lock_irq(&pool->lock);
@@ -2241,7 +2245,7 @@ static int rescuer_thread(void *__rescue

spin_unlock_irq(&wq_mayday_lock);

- worker_attach_to_pool(rescuer, pool);
+ worker_attach_to_pool(rescuer, pool, false);

spin_lock_irq(&pool->lock);
rescuer->pool = pool;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/