Re: [PATCH tip/core/rcu 45/47] rcu: Allow nesting ofrcu_idle_enter() and rcu_idle_exit()

From: Frederic Weisbecker
Date: Wed Feb 08 2012 - 23:07:12 EST


On Fri, Feb 03, 2012 at 05:45:20PM -0800, Paul E. McKenney wrote:
> From: "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx>
>
> Use of RCU in the idle loop is incorrect, quite a few instances of
> just that have made their way into mainline, primarily event tracing.
> The problem with RCU read-side critical sections on CPUs that RCU believes
> to be idle is that RCU is completely ignoring the CPU, along with any
> attempts and RCU read-side critical sections.
>
> The approaches of eliminating the offending uses and of pushing the
> definition of idle down beyond the offending uses have both proved
> impractical. The new approach is to encapsulate offending uses of RCU
> with rcu_idle_exit() and rcu_idle_enter(), but this requires nesting
> for code that is invoked both during idle and and during normal execution.
> Therefore, this commit modifies rcu_idle_enter() and rcu_idle_exit() to
> permit nesting.
>
> Signed-off-by: Paul E. McKenney <paul.mckenney@xxxxxxxxxx>
> Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> Reviewed-by: Josh Triplett <josh@xxxxxxxxxxxxxxxx>
> Acked-by: Deepthi Dharwar <deepthi@xxxxxxxxxxxxxxxxxx>
> ---
> kernel/rcu.h | 21 ++++++++++++++++++++-
> kernel/rcutiny.c | 16 ++++++++++++----
> kernel/rcutree.c | 21 ++++++++++++++-------
> 3 files changed, 46 insertions(+), 12 deletions(-)
>
> diff --git a/kernel/rcu.h b/kernel/rcu.h
> index 30876f4..8ba99cd 100644
> --- a/kernel/rcu.h
> +++ b/kernel/rcu.h
> @@ -33,8 +33,27 @@
> * Process-level increment to ->dynticks_nesting field. This allows for
> * architectures that use half-interrupts and half-exceptions from
> * process context.
> + *
> + * DYNTICK_TASK_NEST_MASK defines a field of width DYNTICK_TASK_NEST_WIDTH
> + * that counts the number of process-based reasons why RCU cannot
> + * consider the corresponding CPU to be idle, and DYNTICK_TASK_NEST_VALUE
> + * is the value used to increment or decrement this field.
> + *
> + * The rest of the bits could in principle be used to count interrupts,
> + * but this would mean that a negative-one value in the interrupt
> + * field could incorrectly zero out the DYNTICK_TASK_NEST_MASK field.
> + * We therefore provide a two-bit guard field defined by DYNTICK_TASK_MASK
> + * that is set to DYNTICK_TASK_FLAG upon initial exit from idle.
> + * The DYNTICK_TASK_EXIT_IDLE value is thus the combined value used upon
> + * initial exit from idle.
> */
> -#define DYNTICK_TASK_NESTING (LLONG_MAX / 2 - 1)
> +#define DYNTICK_TASK_NEST_WIDTH 7
> +#define DYNTICK_TASK_NEST_VALUE ((LLONG_MAX >> DYNTICK_TASK_NEST_WIDTH) + 1)
> +#define DYNTICK_TASK_NEST_MASK (LLONG_MAX - DYNTICK_TASK_NEST_VALUE + 1)
> +#define DYNTICK_TASK_FLAG ((DYNTICK_TASK_NEST_VALUE / 8) * 2)
> +#define DYNTICK_TASK_MASK ((DYNTICK_TASK_NEST_VALUE / 8) * 3)

There is one unused bit between DYNTICK_TASK_NEST_MASK and DYNTICK_TASK_MASK, is
that intentional?

Also do you want to allow nesting of that kind?

rcu_idle_enter();
rcu_idle_enter();
rcu_idle_exit();
rcu_idle_exit()

in which case I guess that rcu_irq_enter()/rcu_irq_exit() also need to
be updated.

If we have this:

rcu_idle_enter()
rcu_idle_enter()

rcu_irq_enter()
rcu_irq_exit()

rcu_idle_exit()
rcu_idle_exit()

On rcu_irq_enter(), oldval will never be 0 and we'll miss rcu_idle_exit_common().
rcu_irq_exit() has a similar problem as it won't enter rcu_idle_enter_common().

Its check on WARN_ON_ONCE(rdtp->dynticks_nesting < 0) is also wrong because after
two calls of rcu_idle_enter(), the value of dynticks_nesting is negative : it's
-DYNTICK_TASK_NEST_VALUE.

Perhaps this change would allow that. But again that's just in case you need to
support that kind of nesting.

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index eacc10b..0b7d946 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -430,8 +430,8 @@ void rcu_irq_exit(void)
rdtp = &__get_cpu_var(rcu_dynticks);
oldval = rdtp->dynticks_nesting;
rdtp->dynticks_nesting--;
- WARN_ON_ONCE(rdtp->dynticks_nesting < 0);
- if (rdtp->dynticks_nesting)
+ WARN_ON_ONCE(!oldval);
+ if (rdtp->dynticks_nesting & ~DYNTICK_TASK_NEST_MASK)
trace_rcu_dyntick("--=", oldval, rdtp->dynticks_nesting);
else
rcu_idle_enter_common(rdtp, oldval);
@@ -525,8 +525,8 @@ void rcu_irq_enter(void)
rdtp = &__get_cpu_var(rcu_dynticks);
oldval = rdtp->dynticks_nesting;
rdtp->dynticks_nesting++;
- WARN_ON_ONCE(rdtp->dynticks_nesting == 0);
- if (oldval)
+ WARN_ON_ONCE(oldval == ~DYNTICK_TASK_NEST_MASK);
+ if (oldval & ~DYNTICK_TASK_NEST_MASK)
trace_rcu_dyntick("++=", oldval, rdtp->dynticks_nesting);
else
rcu_idle_exit_common(rdtp, oldval);



> +#define DYNTICK_TASK_EXIT_IDLE (DYNTICK_TASK_NEST_VALUE + \
> + DYNTICK_TASK_FLAG)
>
> /*
> * debug_rcu_head_queue()/debug_rcu_head_unqueue() are used internally
> diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c
> index 4eb34fc..c8b0e15 100644
> --- a/kernel/rcutiny.c
> +++ b/kernel/rcutiny.c
> @@ -53,7 +53,7 @@ static void __call_rcu(struct rcu_head *head,
>
> #include "rcutiny_plugin.h"
>
> -static long long rcu_dynticks_nesting = DYNTICK_TASK_NESTING;
> +static long long rcu_dynticks_nesting = DYNTICK_TASK_EXIT_IDLE;
>
> /* Common code for rcu_idle_enter() and rcu_irq_exit(), see kernel/rcutree.c. */
> static void rcu_idle_enter_common(long long oldval)
> @@ -88,7 +88,12 @@ void rcu_idle_enter(void)
>
> local_irq_save(flags);
> oldval = rcu_dynticks_nesting;
> - rcu_dynticks_nesting = 0;
> + WARN_ON_ONCE((rcu_dynticks_nesting & DYNTICK_TASK_NEST_MASK) == 0);
> + if ((rcu_dynticks_nesting & DYNTICK_TASK_NEST_MASK) ==
> + DYNTICK_TASK_NEST_VALUE)
> + rcu_dynticks_nesting = 0;
> + else
> + rcu_dynticks_nesting -= DYNTICK_TASK_NEST_VALUE;
> rcu_idle_enter_common(oldval);
> local_irq_restore(flags);
> }
> @@ -140,8 +145,11 @@ void rcu_idle_exit(void)
>
> local_irq_save(flags);
> oldval = rcu_dynticks_nesting;
> - WARN_ON_ONCE(oldval != 0);
> - rcu_dynticks_nesting = DYNTICK_TASK_NESTING;
> + WARN_ON_ONCE(rcu_dynticks_nesting < 0);
> + if (rcu_dynticks_nesting & DYNTICK_TASK_NEST_MASK)
> + rcu_dynticks_nesting += DYNTICK_TASK_NEST_VALUE;
> + else
> + rcu_dynticks_nesting = DYNTICK_TASK_EXIT_IDLE;
> rcu_idle_exit_common(oldval);
> local_irq_restore(flags);
> }
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index df0e3c1..92b4776 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -198,7 +198,7 @@ void rcu_note_context_switch(int cpu)
> EXPORT_SYMBOL_GPL(rcu_note_context_switch);
>
> DEFINE_PER_CPU(struct rcu_dynticks, rcu_dynticks) = {
> - .dynticks_nesting = DYNTICK_TASK_NESTING,
> + .dynticks_nesting = DYNTICK_TASK_EXIT_IDLE,
> .dynticks = ATOMIC_INIT(1),
> };
>
> @@ -394,7 +394,11 @@ void rcu_idle_enter(void)
> local_irq_save(flags);
> rdtp = &__get_cpu_var(rcu_dynticks);
> oldval = rdtp->dynticks_nesting;
> - rdtp->dynticks_nesting = 0;
> + WARN_ON_ONCE((oldval & DYNTICK_TASK_NEST_MASK) == 0);
> + if ((oldval & DYNTICK_TASK_NEST_MASK) == DYNTICK_TASK_NEST_VALUE)
> + rdtp->dynticks_nesting = 0;
> + else
> + rdtp->dynticks_nesting -= DYNTICK_TASK_NEST_VALUE;
> rcu_idle_enter_common(rdtp, oldval);
> local_irq_restore(flags);
> }
> @@ -467,7 +471,7 @@ static void rcu_idle_exit_common(struct rcu_dynticks *rdtp, long long oldval)
> * Exit idle mode, in other words, -enter- the mode in which RCU
> * read-side critical sections can occur.
> *
> - * We crowbar the ->dynticks_nesting field to DYNTICK_TASK_NESTING to
> + * We crowbar the ->dynticks_nesting field to DYNTICK_TASK_NEST to
> * allow for the possibility of usermode upcalls messing up our count
> * of interrupt nesting level during the busy period that is just
> * now starting.
> @@ -481,8 +485,11 @@ void rcu_idle_exit(void)
> local_irq_save(flags);
> rdtp = &__get_cpu_var(rcu_dynticks);
> oldval = rdtp->dynticks_nesting;
> - WARN_ON_ONCE(oldval != 0);
> - rdtp->dynticks_nesting = DYNTICK_TASK_NESTING;
> + WARN_ON_ONCE(oldval < 0);
> + if (oldval & DYNTICK_TASK_NEST_MASK)
> + rdtp->dynticks_nesting += DYNTICK_TASK_NEST_VALUE;
> + else
> + rdtp->dynticks_nesting = DYNTICK_TASK_EXIT_IDLE;
> rcu_idle_exit_common(rdtp, oldval);
> local_irq_restore(flags);
> }
> @@ -2253,7 +2260,7 @@ rcu_boot_init_percpu_data(int cpu, struct rcu_state *rsp)
> rdp->qlen_lazy = 0;
> rdp->qlen = 0;
> rdp->dynticks = &per_cpu(rcu_dynticks, cpu);
> - WARN_ON_ONCE(rdp->dynticks->dynticks_nesting != DYNTICK_TASK_NESTING);
> + WARN_ON_ONCE(rdp->dynticks->dynticks_nesting != DYNTICK_TASK_EXIT_IDLE);
> WARN_ON_ONCE(atomic_read(&rdp->dynticks->dynticks) != 1);
> rdp->cpu = cpu;
> rdp->rsp = rsp;
> @@ -2281,7 +2288,7 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp, int preemptible)
> rdp->qlen_last_fqs_check = 0;
> rdp->n_force_qs_snap = rsp->n_force_qs;
> rdp->blimit = blimit;
> - rdp->dynticks->dynticks_nesting = DYNTICK_TASK_NESTING;
> + rdp->dynticks->dynticks_nesting = DYNTICK_TASK_EXIT_IDLE;
> atomic_set(&rdp->dynticks->dynticks,
> (atomic_read(&rdp->dynticks->dynticks) & ~0x1) + 1);
> rcu_prepare_for_idle_init(cpu);
> --
> 1.7.8
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/