Re: [PATCH 08/35] autonuma: introduce kthread_bind_node()

From: Peter Zijlstra
Date: Tue May 29 2012 - 13:05:19 EST


On Tue, 2012-05-29 at 18:11 +0200, Andrea Arcangeli wrote:
> On Tue, May 29, 2012 at 02:49:13PM +0200, Peter Zijlstra wrote:
> > On Fri, 2012-05-25 at 19:02 +0200, Andrea Arcangeli wrote:
> > > /**
> > > + * kthread_bind_node - bind a just-created kthread to the CPUs of a node.
> > > + * @p: thread created by kthread_create().
> > > + * @nid: node (might not be online, must be possible) for @k to run on.
> > > + *
> > > + * Description: This function is equivalent to set_cpus_allowed(),
> > > + * except that @nid doesn't need to be online, and the thread must be
> > > + * stopped (i.e., just returned from kthread_create()).
> > > + */
> > > +void kthread_bind_node(struct task_struct *p, int nid)
> > > +{
> > > + /* Must have done schedule() in kthread() before we set_task_cpu */
> > > + if (!wait_task_inactive(p, TASK_UNINTERRUPTIBLE)) {
> > > + WARN_ON(1);
> > > + return;
> > > + }
> > > +
> > > + /* It's safe because the task is inactive. */
> > > + do_set_cpus_allowed(p, cpumask_of_node(nid));
> > > + p->flags |= PF_THREAD_BOUND;
> >
> > No, I've said before, this is wrong. You should only ever use
> > PF_THREAD_BOUND when its strictly per-cpu. Moving the your numa threads
> > to a different node is silly but not fatal in any way.
>
> I changed the semantics of that bitflag, now it means: userland isn't
> allowed to shoot itself in the foot and mess with whatever CPU
> bindings the kernel has set for the kernel thread.

Yeah, and you did so without mentioning that in your changelog.
Furthermore I object to that change. I object even more strongly to
doing it without mention and keeping a misleading comment near the
definition.

> It'd be a clear regress not to set PF_THREAD_BOUND there. It would be
> even worse to remove the CPU binding to the node: it'd risk to copy
> memory with both src and dst being in remote nodes from the CPU where
> knuma_migrate runs on (there aren't just 2 node systems out there).

Just teach each knuma_migrated what node it represents and don't use
numa_node_id().

That way you can change the affinity just fine, it'll be sub-optimal,
copying memory from node x to node y through node z, but it'll still
work correctly.

numa isn't special in the way per-cpu stuff is special.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/