Re: [PATCH 09/40] autonuma: introduce kthread_bind_node()

From: Johannes Weiner
Date: Thu Jul 05 2012 - 09:09:15 EST


On Fri, Jun 29, 2012 at 12:58:01PM -0400, Rik van Riel wrote:
> On 06/29/2012 12:38 PM, Andrea Arcangeli wrote:
> >On Fri, Jun 29, 2012 at 11:36:26AM -0400, Rik van Riel wrote:
> >>On 06/28/2012 08:55 AM, Andrea Arcangeli wrote:
> >>
> >>>--- a/include/linux/sched.h
> >>>+++ b/include/linux/sched.h
> >>>@@ -1792,7 +1792,7 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
> >>> #define PF_SWAPWRITE 0x00800000 /* Allowed to write to swap */
> >>> #define PF_SPREAD_PAGE 0x01000000 /* Spread page cache over cpuset */
> >>> #define PF_SPREAD_SLAB 0x02000000 /* Spread some slab caches over cpuset */
> >>>-#define PF_THREAD_BOUND 0x04000000 /* Thread bound to specific cpu */
> >>>+#define PF_THREAD_BOUND 0x04000000 /* Thread bound to specific cpus */
> >>> #define PF_MCE_EARLY 0x08000000 /* Early kill for mce process policy */
> >>> #define PF_MEMPOLICY 0x10000000 /* Non-default NUMA mempolicy */
> >>> #define PF_MUTEX_TESTER 0x20000000 /* Thread belongs to the rt mutex tester */
> >>
> >>Changing the semantics of PF_THREAD_BOUND without so much as
> >>a comment in your changelog or buy-in from the scheduler
> >>maintainers is a big no-no.
> >>
> >>Is there any reason you even need PF_THREAD_BOUND in your
> >>kernel numa threads?
> >>
> >>I do not see much at all in the scheduler code that uses
> >>PF_THREAD_BOUND and it is not clear at all that your
> >>numa threads get any benefit from them...
> >>
> >>Why do you think you need it?
>
> >This flag is only used to prevent userland to mess with the kernel CPU
> >binds of kernel threads. It is used to avoid the root user to shoot
> >itself in the foot.
> >
> >So far it has been used to prevent changing bindings to a single
> >CPU. I'm setting it also after making a multiple-cpu bind (all CPUs of
> >the node, instead of just 1 CPU).
>
> Fair enough. Looking at the scheduler code some more, I
> see that all PF_THREAD_BOUND seems to do is block userspace
> from changing a thread's CPU bindings.
>
> Peter and Ingo, what is the special magic in PF_THREAD_BOUND
> that should make it only apply to kernel threads that are bound
> to a single CPU?

In the very first review iteration of AutoNUMA, Peter argued that the
scheduler people want to use this flag in other places where they rely
on this thing meaning a single cpu, not a group of them (check out the
cpumask test in debug_smp_processor_id() in lib/smp_processor_id.c).

He also argued that preventing root from rebinding the numa daemons is
not critical to this feature at all. And I have to agree.

I certainly think this is NOT the change to make a stand about in this
patch set, seriously. Not about a nice-to-have thing like this that
doesn't really hurt dropping but does create contention.

It can always be a separate effort to bring in such a flag that would
allow it to be used by other daemons, but this really should be a
separate effort and I don't think anything is really lost by dropping
the change from this series.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/