Re: [PATCH RFC nohz_full v2 6/7] nohz_full: Add full-system-idlestate machine

From: Paul E. McKenney
Date: Mon Jul 01 2013 - 18:51:22 EST


On Mon, Jul 01, 2013 at 11:38:34PM +0200, Frederic Weisbecker wrote:
> On Fri, Jun 28, 2013 at 01:10:21PM -0700, Paul E. McKenney wrote:
> > +/*
> > + * Check to see if the system is fully idle, other than the timekeeping CPU.
> > + * The caller must have disabled interrupts.
> > + */
> > +bool rcu_sys_is_idle(void)
>
> Where is this function called? I can't find any caller in the patchset.

It should be called at the point where the timekeeping CPU is going
idle. If it returns true, then the timekeeping CPU can shut off the
scheduling-clock interrupt.

> > +{
> > + static struct rcu_sysidle_head rsh;
> > + int rss = ACCESS_ONCE(full_sysidle_state);
> > +
> > + WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu);
> > +
> > + /* Handle small-system case by doing a full scan of CPUs. */
> > + if (nr_cpu_ids <= RCU_SYSIDLE_SMALL && rss < RCU_SYSIDLE_FULL) {
> > + int cpu;
> > + bool isidle = true;
> > + unsigned long maxj = jiffies - ULONG_MAX / 4;
> > + struct rcu_data *rdp;
> > +
> > + /* Scan all the CPUs looking for nonidle CPUs. */
> > + for_each_possible_cpu(cpu) {
> > + rdp = per_cpu_ptr(rcu_sysidle_state->rda, cpu);
> > + rcu_sysidle_check_cpu(rdp, &isidle, &maxj);
> > + if (!isidle)
> > + break;
> > + }
> > + rcu_sysidle_report(rcu_sysidle_state, isidle, maxj);
> > + rss = ACCESS_ONCE(full_sysidle_state);
> > + }
> > +
> > + /* If this is the first observation of an idle period, record it. */
> > + if (rss == RCU_SYSIDLE_FULL) {
> > + rss = cmpxchg(&full_sysidle_state,
> > + RCU_SYSIDLE_FULL, RCU_SYSIDLE_FULL_NOTED);
> > + return rss == RCU_SYSIDLE_FULL;
> > + }
> > +
> > + smp_mb(); /* ensure rss load happens before later caller actions. */
> > +
> > + /* If already fully idle, tell the caller (in case of races). */
> > + if (rss == RCU_SYSIDLE_FULL_NOTED)
> > + return true;
> > +
> > + /*
> > + * If we aren't there yet, and a grace period is not in flight,
> > + * initiate a grace period. Either way, tell the caller that
> > + * we are not there yet.
> > + */
> > + if (nr_cpu_ids > RCU_SYSIDLE_SMALL &&
> > + !rcu_gp_in_progress(rcu_sysidle_state) &&
> > + !rsh.inuse && xchg(&rsh.inuse, 1) == 0)
> > + call_rcu(&rsh.rh, rcu_sysidle_cb);
>
> So this starts an RCU/RCU_preempt grace period to force the global idle
> detection.
>
> Would it make sense to create a new RCU flavour instead for this purpose?
> Its only per CPU quiescent state would be when the timekeeping CPU ticks
> (from rcu_check_callbacks()). The other CPUs would only complete their
> QS request through extended quiescent states, ie: only the timekeeping
> CPU is burdened.
>
> This way you can enqueue a callback that is executed in the end of the
> grace period for that flavour and that callback can help driving the
> state machine somehow.
>
> Now may be that's not a good idea because this adds some overhead to
> any code that uses for_each_rcu_flavour().

Also it adds overhead. The most active RCU flavor will almost always
have grace periods in flight, so the above call_rcu() should be invoked
rarely on most systems.

Thanx, Paul

> > + return false;
> > }
> >
> > /*
> > @@ -2494,6 +2734,21 @@ static void rcu_sysidle_exit(struct rcu_dynticks *rdtp, int irq)
> > {
> > }
> >
> > +static void rcu_sysidle_check_cpu(struct rcu_data *rdp, bool *isidle,
> > + unsigned long *maxj)
> > +{
> > +}
> > +
> > +static bool is_sysidle_rcu_state(struct rcu_state *rsp)
> > +{
> > + return false;
> > +}
> > +
> > +static void rcu_sysidle_report(struct rcu_state *rsp, int isidle,
> > + unsigned long maxj)
> > +{
> > +}
> > +
> > static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp)
> > {
> > }
> > --
> > 1.8.1.5
> >
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/