Re: [RFC PATCH 10/15] nohz_task: Enter in extended quiescent statewhen in userspace

From: Paul E. McKenney
Date: Tue Dec 21 2010 - 21:20:58 EST


On Tue, Dec 21, 2010 at 10:49:43PM +0100, Frederic Weisbecker wrote:
> On Tue, Dec 21, 2010 at 11:28:49AM -0800, Paul E. McKenney wrote:
> > On Mon, Dec 20, 2010 at 04:24:17PM +0100, Frederic Weisbecker wrote:
> > > A nohz task can safely enter into extended quiescent state when
> > > it goes into userspace, this avoids a remote cpu to force the
> > > nohz task to be interrupted in order to notify quiescent states.
> > >
> > > We enter into an extended quiescent state when:
> > >
> > > - A nohz task resumes to userspace and is alone running on the
> > > CPU (we check if the local cpu is in nohz mode, which means
> > > no other task compete on that CPU). If the tick is still running
> > > then entering into extended QS will be done later from the second
> > > case:
> > >
> > > - When the tick stops and verify the current task is a nohz one,
> > > is alone running on the CPU and runs in userspace.
> > >
> > > We exit the extended quiescent state when:
> > >
> > > - A nohz task enters the kernel and is alone running on the CPU.
> > > Again we check if the local cpu is in nohz mode for that. If
> > > the tick is still running then it means we are not in an extended
> > > QS and we don't do anything.
> > >
> > > - The tick restarts because a new task is enqueued.
> > >
> > > Whether the nohz task is in userspace or not is tracked by the
> > > per cpu nohz_task_ext_qs variable.
> > >
> > > Architectures need to provide some backend to notify userspace
> > > exit/entry in order to support this mode.
> > > It needs to implement the TIF_NOHZ flag that switches to slow
> > > path syscall mode and to notify exceptions entry/exit.
> > >
> > > We don't need to handle irqs or nmis as those are already handled
> > > by RCU through rcu_enter_irq/nmi helpers.
> >
> > One question below...
> >
> > > Signed-off-by: Frederic Weisbecker <fweisbec@xxxxxxxxx>
> > > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> > > Cc: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> > > Cc: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> > > Cc: Ingo Molnar <mingo@xxxxxxx>
> > > Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
> > > Cc: Lai Jiangshan <laijs@xxxxxxxxxxxxxx>
> > > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> > > Cc: Anton Blanchard <anton@xxxxxxxxxxx>
> > > Cc: Tim Pepper <lnxninja@xxxxxxxxxxxxxxxxxx>
> > > ---
> > > arch/Kconfig | 4 +++
> > > include/linux/tick.h | 16 ++++++++++-
> > > kernel/sched.c | 3 ++
> > > kernel/time/tick-sched.c | 61 +++++++++++++++++++++++++++++++++++++++++++++-
> > > 4 files changed, 81 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/arch/Kconfig b/arch/Kconfig
> > > index e631791..d1ebea3 100644
> > > --- a/arch/Kconfig
> > > +++ b/arch/Kconfig
> > > @@ -177,5 +177,9 @@ config HAVE_ARCH_JUMP_LABEL
> > >
> > > config HAVE_NO_HZ_TASK
> > > bool
> > > + help
> > > + Features necessary hooks for a task wanting to enter nohz
> > > + while running alone on a CPU: thread flag for syscall hooks
> > > + and exceptions entry/exit hooks.
> > >
> > > source "kernel/gcov/Kconfig"
> > > diff --git a/include/linux/tick.h b/include/linux/tick.h
> > > index 7465a47..a704bb7 100644
> > > --- a/include/linux/tick.h
> > > +++ b/include/linux/tick.h
> > > @@ -8,6 +8,7 @@
> > >
> > > #include <linux/clockchips.h>
> > > #include <linux/percpu-defs.h>
> > > +#include <asm/ptrace.h>
> > >
> > > #ifdef CONFIG_GENERIC_CLOCKEVENTS
> > >
> > > @@ -130,10 +131,21 @@ extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
> > >
> > > #ifdef CONFIG_NO_HZ_TASK
> > > DECLARE_PER_CPU(int, task_nohz_mode);
> > > +DECLARE_PER_CPU(int, nohz_task_ext_qs);
> > > +
> > > +extern void tick_nohz_task_enter_kernel(void);
> > > +extern void tick_nohz_task_exit_kernel(void);
> > > +extern void tick_nohz_task_enter_exception(struct pt_regs *regs);
> > > +extern void tick_nohz_task_exit_exception(struct pt_regs *regs);
> > > extern int tick_nohz_task_mode(void);
> > > -#else
> > > +
> > > +#else /* !NO_HZ_TASK */
> > > +static inline void tick_nohz_task_enter_kernel(void) { }
> > > +static inline void tick_nohz_task_exit_kernel(void) { }
> > > +static inline void tick_nohz_task_enter_exception(struct pt_regs *regs) { }
> > > +static inline void tick_nohz_task_exit_exception(struct pt_regs *regs) { }
> > > static inline int tick_nohz_task_mode(void) { return 0; }
> > > -#endif
> > > +#endif /* !NO_HZ_TASK */
> > >
> > > # else /* !NO_HZ */
> > > static inline void tick_nohz_stop_sched_tick(int inidle) { }
> > > diff --git a/kernel/sched.c b/kernel/sched.c
> > > index b99f192..4412493 100644
> > > --- a/kernel/sched.c
> > > +++ b/kernel/sched.c
> > > @@ -2464,6 +2464,9 @@ static void nohz_task_cpu_update(void *unused)
> > > if (rq->nr_running > 1 || rcu_pending(cpu) || rcu_needs_cpu(cpu)) {
> >
> > If the task enters a system call in nohz mode, and then that system call
> > enqueues an RCU callback, this code path will pull that CPU out of nohz
> > mode, right?
> >
> > Thanx, Paul
>
> Hmm, no because this code path is only called after rcu or the scheduler sends
> an IPI. And rcu won't call it after it enqueues a callback.
>
> I did not think about that. If every other CPUs are in extended quiescent
> state, nobody will take care of the grace period comletion, unless we are
> lucky in the whole GP completion scenario. And at least the current CPU
> that enqueues the callbacks is supposed to take care of that grace period
> completion, right?
>
> So I guess I need to restat the tick from there too.

Please!!! ;-)

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/