Re: Mysterious CFQ crash and RCU

From: Paul E. McKenney
Date: Tue May 24 2011 - 00:14:52 EST


On Tue, May 24, 2011 at 12:20:40AM +0200, Paul Bolle wrote:
> On Mon, 2011-05-23 at 08:38 -0700, Paul E. McKenney wrote:
> > Running under CONFIG_PREEMPT=y (along with CONFIG_TREE_PREEMPT_RCU=y)
> > could be very helpful in and of itself. CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
> > can also be helpful. In post-2.6.39 mainline, it should be possible
> > to set CONFIG_DEBUG_OBJECTS_RCU_HEAD=y without CONFIG_PREEMPT=y, but
> > again, CONFIG_PREEMPT=y can help find problems.
>
> 0) The first thing I tried (from your suggestions) was
> CONFIG_DEBUG_OBJECTS_RCU_HEAD=y. Given its dependencies (and, well, the
> build system I used) I ended up with:
>
> $ grep -e PREEMPT -e RCU /boot/config-2.6.39-0.local3.fc16.i686 |
> grep -v "^#"
> CONFIG_TREE_PREEMPT_RCU=y
> CONFIG_PREEMPT_RCU=y
> CONFIG_RCU_FANOUT=32
> CONFIG_PREEMPT_NOTIFIERS=y
> CONFIG_PREEMPT=y
> CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
> CONFIG_DEBUG_PREEMPT=y
> CONFIG_PROVE_RCU=y
> CONFIG_SPARSE_RCU_POINTER=y
>
> It looks like I am unable to trigger the issue we're talking about here
> when using that config.

Interesting. One way for this to happen is to block inside an RCU
read-side critical section. I need to think about appropriate
diagnostics for this.

> 1) For reference, the config of a kernel that does trigger it had:
>
> $ grep -e PREEMPT -e RCU /boot/config-2.6.39-0.local2.fc16.i686 |
> grep -v "^#"
> CONFIG_TREE_RCU=y
> CONFIG_RCU_FANOUT=32
> CONFIG_RCU_FAST_NO_HZ=y
> CONFIG_PREEMPT_NOTIFIERS=y
> CONFIG_PREEMPT_VOLUNTARY=y
> CONFIG_PROVE_RCU=y
> CONFIG_SPARSE_RCU_POINTER=y
>
> > > Again CONFIG_TREE_PREEMPT_RCU is available only if PREEMPT=y. So should
> > > we enable preemtion and CONFIG_TREE_PREEMPT_RCU=y and try to reproduce
> > > the issue?
> >
> > Please!
>
> 2) It appears I can't reproduce with those options enabled (see above).
>
> > Polling is fine. Please see attached for a script to poll at 15-second
> > intervals. Please also feel free to adjust, just tell me what you
> > adjusted.
>
> And should I now try to run that script on a config that triggers this
> issue (such as the config under 1) above)?

It might help while I am working out more targetted diagnostics.

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/