Re: [PATCH tip/core/rcu 11/17] rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs

From: Paul E. McKenney
Date: Tue Jul 08 2014 - 18:06:11 EST


On Tue, Jul 08, 2014 at 10:40:11PM +0200, Frederic Weisbecker wrote:
> On Tue, Jul 08, 2014 at 12:58:37PM -0700, Paul E. McKenney wrote:
> > On Tue, Jul 08, 2014 at 08:38:47PM +0200, Frederic Weisbecker wrote:
> > > On Tue, Jul 08, 2014 at 08:47:23AM -0700, Paul E. McKenney wrote:
> > > > On Tue, Jul 08, 2014 at 05:24:00PM +0200, Frederic Weisbecker wrote:
> > > > > On Mon, Jul 07, 2014 at 03:38:15PM -0700, Paul E. McKenney wrote:
> > > > > > From: "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx>
> > > > > >
> > > > > > Binding the grace-period kthreads to the timekeeping CPU resulted in
> > > > > > significant performance decreases for some workloads. For more detail,
> > > > > > see:
> > > > > >
> > > > > > https://lkml.org/lkml/2014/6/3/395 for benchmark numbers
> > > > > >
> > > > > > https://lkml.org/lkml/2014/6/4/218 for CPU statistics
> > > > > >
> > > > > > It turns out that it is necessary to bind the grace-period kthreads
> > > > > > to the timekeeping CPU only when all but CPU 0 is a nohz_full CPU
> > > > > > on the one hand or if CONFIG_NO_HZ_FULL_SYSIDLE=y on the other.
> > > > > > In other cases, it suffices to bind the grace-period kthreads to the
> > > > > > set of non-nohz_full CPUs.
> > > > > >
> > > > > > This commit therefore creates a tick_nohz_not_full_mask that is the
> > > > > > complement of tick_nohz_full_mask, and then binds the grace-period
> > > > > > kthread to the set of CPUs indicated by this new mask, which covers
> > > > > > the CONFIG_NO_HZ_FULL_SYSIDLE=n case. The CONFIG_NO_HZ_FULL_SYSIDLE=y
> > > > > > case still binds the grace-period kthreads to the timekeeping CPU.
> > > > > > This commit also includes the tick_nohz_full_enabled() check suggested
> > > > > > by Frederic Weisbecker.
> > > > > >
> > > > > > Reported-by: Jet Chen <jet.chen@xxxxxxxxx>
> > > > > > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> > > > > > [ paulmck: Created housekeeping_affine() per fweisbec feedback. ]
> > > > > > ---
> > > > > > include/linux/tick.h | 19 +++++++++++++++++++
> > > > > > kernel/rcu/tree_plugin.h | 14 +++++++++-----
> > > > > > kernel/time/tick-sched.c | 10 ++++++++++
> > > > > > 3 files changed, 38 insertions(+), 5 deletions(-)
> > > > > >
> > > > > > diff --git a/include/linux/tick.h b/include/linux/tick.h
> > > > > > index b84773cb9f4c..c39af3261351 100644
> > > > > > --- a/include/linux/tick.h
> > > > > > +++ b/include/linux/tick.h
> > > > > > @@ -12,6 +12,7 @@
> > > > > > #include <linux/hrtimer.h>
> > > > > > #include <linux/context_tracking_state.h>
> > > > > > #include <linux/cpumask.h>
> > > > > > +#include <linux/sched.h>
> > > > > >
> > > > > > #ifdef CONFIG_GENERIC_CLOCKEVENTS
> > > > > >
> > > > > > @@ -162,6 +163,7 @@ static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
> > > > > > #ifdef CONFIG_NO_HZ_FULL
> > > > > > extern bool tick_nohz_full_running;
> > > > > > extern cpumask_var_t tick_nohz_full_mask;
> > > > > > +extern cpumask_var_t tick_nohz_not_full_mask;
> > > > >
> > > > > So I'm still puzzled by this mask.
> > > > >
> > > > > How about creating a temporary cpumask on top of tick_nohz_full_mask
> > > > > from housekeeping_affine().
> > > > >
> > > > > If you wonder about performance, this can be called once for good from
> > > > > rcu_spawn_gp_kthread() (that would be much better than checking that all
> > > > > the time from the kthread itself anyway).
> > > >
> > > > I was figuring that a fair number of the kthreads might eventually
> > > > be using this, not just for the grace-period kthreads.
> > >
> > > Ok makes sense. But can we just rename the cpumask to housekeeping_mask?
> >
> > Good point! After all, it someday might be something other than the
> > complement of tick_nohz_full_mask.
> >
> > > > In addition,
> > > > my concern about once-for-good affinity is for the NO_HZ_FULL_SYSIDLE=y
> > > > case, where moving the grace-period kthreads prevents ever entering
> > > > full-system idle state.
> > > >
> > > > Or am I missing some use case?
> > >
> > > No that's what I had in mind. But rcu_spawn_gp_kthread() still looks like
> > > a better place for that. Or am I missing something else?
> >
> > My fear was that sysadmins would move it in the NO_HZ_FULL_SYSIDLE=y
> > case, in which case it needs to move back.
>
> If you use kthread_bind(), this can't be overriden by the user. See PF_NO_SETAFFINITY.

Fair point. This would be a kthread_bind_housekeeping(), then.

But I need to create an kthread_bind_mask() or some such that acts like
kthread_bind(), but which takes a cpumask rather than a cpu. Easy
enough to do, but sounds like 3.18 material rather than 3.17 material.
So this is on my 3.18 list.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/