Re: Mysterious CFQ crash and RCU

From: Paul E. McKenney
Date: Tue May 24 2011 - 11:51:24 EST


On Tue, May 24, 2011 at 08:42:38AM -0700, Paul E. McKenney wrote:
> On Tue, May 24, 2011 at 04:51:44PM +0200, Jens Axboe wrote:
> > On 2011-05-24 16:35, Paul E. McKenney wrote:
> > > On Tue, May 24, 2011 at 11:41:10AM +0200, Jens Axboe wrote:
> > >> On 2011-05-24 00:20, Paul Bolle wrote:
> > >>> On Mon, 2011-05-23 at 08:38 -0700, Paul E. McKenney wrote:
> > >>>> Running under CONFIG_PREEMPT=y (along with CONFIG_TREE_PREEMPT_RCU=y)
> > >>>> could be very helpful in and of itself. CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
> > >>>> can also be helpful. In post-2.6.39 mainline, it should be possible
> > >>>> to set CONFIG_DEBUG_OBJECTS_RCU_HEAD=y without CONFIG_PREEMPT=y, but
> > >>>> again, CONFIG_PREEMPT=y can help find problems.
> > >>>
> > >>> 0) The first thing I tried (from your suggestions) was
> > >>> CONFIG_DEBUG_OBJECTS_RCU_HEAD=y. Given its dependencies (and, well, the
> > >>> build system I used) I ended up with:
> > >>>
> > >>> $ grep -e PREEMPT -e RCU /boot/config-2.6.39-0.local3.fc16.i686 |
> > >>> grep -v "^#"
> > >>> CONFIG_TREE_PREEMPT_RCU=y
> > >>> CONFIG_PREEMPT_RCU=y
> > >>> CONFIG_RCU_FANOUT=32
> > >>> CONFIG_PREEMPT_NOTIFIERS=y
> > >>> CONFIG_PREEMPT=y
> > >>> CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
> > >>> CONFIG_DEBUG_PREEMPT=y
> > >>> CONFIG_PROVE_RCU=y
> > >>> CONFIG_SPARSE_RCU_POINTER=y
> > >>>
> > >>> It looks like I am unable to trigger the issue we're talking about here
> > >>> when using that config.
> > >>>
> > >>> 1) For reference, the config of a kernel that does trigger it had:
> > >>>
> > >>> $ grep -e PREEMPT -e RCU /boot/config-2.6.39-0.local2.fc16.i686 |
> > >>> grep -v "^#"
> > >>> CONFIG_TREE_RCU=y
> > >>> CONFIG_RCU_FANOUT=32
> > >>> CONFIG_RCU_FAST_NO_HZ=y
> > >>> CONFIG_PREEMPT_NOTIFIERS=y
> > >>> CONFIG_PREEMPT_VOLUNTARY=y
> > >>> CONFIG_PROVE_RCU=y
> > >>> CONFIG_SPARSE_RCU_POINTER=y
> > >>>
> > >>>>> Again CONFIG_TREE_PREEMPT_RCU is available only if PREEMPT=y. So should
> > >>>>> we enable preemtion and CONFIG_TREE_PREEMPT_RCU=y and try to reproduce
> > >>>>> the issue?
> > >>>>
> > >>>> Please!
> > >>>
> > >>> 2) It appears I can't reproduce with those options enabled (see above).
> > >>>
> > >>>> Polling is fine. Please see attached for a script to poll at 15-second
> > >>>> intervals. Please also feel free to adjust, just tell me what you
> > >>>> adjusted.
> > >>>
> > >>> And should I now try to run that script on a config that triggers this
> > >>> issue (such as the config under 1) above)?
> > >>
> > >> Paul, can we see a dmesg from your running system? Perhaps there's some
> > >> dependency on a particular driver or device that makes this easier to
> > >> reproduce.
> > >
> > > Here you go, please see attached.
> > >
> > > I should have some additional diagnostics later today Pacific time.
> >
> > Heh sorry, _other_ Paul :-)
>
> ;-)
>
> > You are not seeing this issue, are you?
>
> No, I am not.
>
> > As per your earlier comment on sleeping under rcu_read_lock(), I checked
> > everything again and it seems sane. Would that not trigger an
> > immediately schedule-while-atomic in any case, regardless of RCU config?
>
> This used to be the case, but not anymore. Hence my additional
> diagnostics.
>
> The way that this used to happen is that CONFIG_PREEMPT=y and
> CONFIG_TREE_RCU=y would disable preemption across RCU read-side sections.
> Then you would see schedule-while-atomic.
>
> However, I recently changed the config setup so that CONFIG_PREEMPT=y
> and CONFIG_SMP=y forces CONFIG_TREE_PREEMPT_RCU=y, which avoids that
> diagnostic. So does CONFIG_SMP=n and CONFIG_TREE_RCU=y, in which case
> preemption is always disabled, so there is no way for the scheduler to
> see that there was anything atomic going on.
>
> So I am adding a similar diagnostic under CONFIG_PROVE_RCU.
>
> Rather embarrassing -- I totally missed the fact that this diagnostic
> would be lost. :-/
>
> Please see below for a sneak preview of the diagnostic patch.

And see below for one that might have a better chance of working
in modules.

Thanx, Paul

------------------------------------------------------------------------

rcu: Restore checks for blocking in RCU read-side critical sections

Long ago, using TREE_RCU with PREEMPT would result in "scheduling
while atomic" diagnostics if you blocked in an RCU read-side critical
section. However, PREEMPT now implies TREE_PREEMPT_RCU, which defeats
this diagnostic. This commit therefore adds a replacement diagnostic
based on PROVE_RCU.

Because rcu_lockdep_assert() and lockdep_rcu_dereference() are now being
used for things that have nothing to do with rcu_dereference(), rename
lockdep_rcu_dereference() to lockdep_rcu_suspicious() and add a third
argument that is a string indicating what is suspicious. This third
argument is passed in from a new third argument to rcu_lockdep_assert().
Update all calls to rcu_lockdep_assert() to add an informative third
argument.

Finally, add a pair of rcu_lockdep_assert() calls from within
rcu_note_context_switch(), one complaining if a context switch occurs
in an RCU-bh read-side critical section and another complaining if a
context switch occurs in an RCU-sched read-side critical section.
These are present only if the PROVE_RCU kernel parameter is enabled.

Again, you must enable PROVE_RCU to see these new diagnostics. But you
are enabling PROVE_RCU to check out new RCU uses in any case, aren't you?

Still-not-yet-signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 4aef1dd..bfa66e7 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -545,7 +545,7 @@ do { \
#endif

#ifdef CONFIG_PROVE_RCU
-extern void lockdep_rcu_dereference(const char *file, const int line);
+void lockdep_rcu_suspicious(const char *file, const int line, const char *s);
#endif

#endif /* __LINUX_LOCKDEP_H */
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 58b13f1..fb2933d 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -297,19 +297,20 @@ extern int rcu_my_thread_group_empty(void);
/**
* rcu_lockdep_assert - emit lockdep splat if specified condition not met
* @c: condition to check
+ * @s: informative message
*/
-#define rcu_lockdep_assert(c) \
+#define rcu_lockdep_assert(c, s) \
do { \
static bool __warned; \
if (debug_lockdep_rcu_enabled() && !__warned && !(c)) { \
__warned = true; \
- lockdep_rcu_dereference(__FILE__, __LINE__); \
+ lockdep_rcu_suspicious(__FILE__, __LINE__, s); \
} \
} while (0)

#else /* #ifdef CONFIG_PROVE_RCU */

-#define rcu_lockdep_assert(c) do { } while (0)
+#define rcu_lockdep_assert(c, s) do { } while (0)

#endif /* #else #ifdef CONFIG_PROVE_RCU */

@@ -338,14 +339,16 @@ extern int rcu_my_thread_group_empty(void);
#define __rcu_dereference_check(p, c, space) \
({ \
typeof(*p) *_________p1 = (typeof(*p)*__force )ACCESS_ONCE(p); \
- rcu_lockdep_assert(c); \
+ rcu_lockdep_assert(c, "suspicious rcu_dereference_check()" \
+ " usage"); \
rcu_dereference_sparse(p, space); \
smp_read_barrier_depends(); \
((typeof(*p) __force __kernel *)(_________p1)); \
})
#define __rcu_dereference_protected(p, c, space) \
({ \
- rcu_lockdep_assert(c); \
+ rcu_lockdep_assert(c, "suspicious rcu_dereference_protected()" \
+ " usage"); \
rcu_dereference_sparse(p, space); \
((typeof(*p) __force __kernel *)(p)); \
})
@@ -359,7 +362,9 @@ extern int rcu_my_thread_group_empty(void);
#define __rcu_dereference_index_check(p, c) \
({ \
typeof(p) _________p1 = ACCESS_ONCE(p); \
- rcu_lockdep_assert(c); \
+ rcu_lockdep_assert(c, \
+ "suspicious rcu_dereference_index_check()" \
+ " usage"); \
smp_read_barrier_depends(); \
(_________p1); \
})
diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 53a6895..9789028 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -3798,7 +3798,7 @@ void lockdep_sys_exit(void)
}
}

-void lockdep_rcu_dereference(const char *file, const int line)
+void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
{
struct task_struct *curr = current;

@@ -3807,9 +3807,10 @@ void lockdep_rcu_dereference(const char *file, const int line)
return;
#endif /* #ifdef CONFIG_PROVE_RCU_REPEATEDLY */
/* Note: the following can be executed concurrently, so be careful. */
- printk("\n===================================================\n");
- printk( "[ INFO: suspicious rcu_dereference_check() usage. ]\n");
- printk( "---------------------------------------------------\n");
+ printk("\n===============================\n");
+ printk( "[ INFO: suspicious RCU usage. ]\n");
+ printk( "-------------------------------\n");
+ printk( "%s\n", s);
printk("%s:%d invoked rcu_dereference_check() without protection!\n",
file, line);
printk("\nother info that might help us debug this:\n\n");
@@ -3818,4 +3819,4 @@ void lockdep_rcu_dereference(const char *file, const int line)
printk("\nstack backtrace:\n");
dump_stack();
}
-EXPORT_SYMBOL_GPL(lockdep_rcu_dereference);
+EXPORT_SYMBOL_GPL(lockdep_rcu_suspicious);
diff --git a/kernel/pid.c b/kernel/pid.c
index 57a8346..a7577b3 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -419,7 +419,9 @@ EXPORT_SYMBOL(pid_task);
*/
struct task_struct *find_task_by_pid_ns(pid_t nr, struct pid_namespace *ns)
{
- rcu_lockdep_assert(rcu_read_lock_held());
+ rcu_lockdep_assert(rcu_read_lock_held(),
+ "find_task_by_pid_ns() needs rcu_read_lock()"
+ " protection");
return pid_task(find_pid_ns(nr, ns), PIDTYPE_PID);
}

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index b2868ea..ec851db 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -153,6 +153,12 @@ void rcu_bh_qs(int cpu)
*/
void rcu_note_context_switch(int cpu)
{
+ rcu_lockdep_assert(!rcu_read_lock_bh_held(),
+ "Illegal context switch in RCU-bh"
+ " read-side critical section");
+ rcu_lockdep_assert(!rcu_read_lock_sched_held(),
+ "Illegal context switch in RCU-sched"
+ " read-side critical section");
rcu_sched_qs(cpu);
rcu_preempt_note_context_switch(cpu);
}
@@ -1675,11 +1681,15 @@ static int rcu_node_kthread(void *arg)

for (;;) {
rnp->node_kthread_status = RCU_KTHREAD_WAITING;
+ printk(KERN_INFO "rcun %p starting wait for work.\n", rnp);
rcu_wait(atomic_read(&rnp->wakemask) != 0);
+ printk(KERN_INFO "rcun %p completed wait for work.\n", rnp);
rnp->node_kthread_status = RCU_KTHREAD_RUNNING;
raw_spin_lock_irqsave(&rnp->lock, flags);
mask = atomic_xchg(&rnp->wakemask, 0);
+ printk(KERN_INFO "rcun %p initiating boost.\n", rnp);
rcu_initiate_boost(rnp, flags); /* releases rnp->lock. */
+ printk(KERN_INFO "rcun %p completed boost.\n", rnp);
for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask >>= 1) {
if ((mask & 0x1) == 0)
continue;
@@ -1689,10 +1699,12 @@ static int rcu_node_kthread(void *arg)
preempt_enable();
continue;
}
+ printk(KERN_INFO "rcun %p awaking rcuc%d.\n", rnp, cpu);
per_cpu(rcu_cpu_has_work, cpu) = 1;
sp.sched_priority = RCU_KTHREAD_PRIO;
sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
preempt_enable();
+ printk(KERN_INFO "rcun %p awakened rcuc%d.\n", rnp, cpu);
}
}
/* NOTREACHED */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/