Re: Mysterious CFQ crash and RCU

From: Jens Axboe
Date: Tue May 24 2011 - 10:51:53 EST


On 2011-05-24 16:35, Paul E. McKenney wrote:
> On Tue, May 24, 2011 at 11:41:10AM +0200, Jens Axboe wrote:
>> On 2011-05-24 00:20, Paul Bolle wrote:
>>> On Mon, 2011-05-23 at 08:38 -0700, Paul E. McKenney wrote:
>>>> Running under CONFIG_PREEMPT=y (along with CONFIG_TREE_PREEMPT_RCU=y)
>>>> could be very helpful in and of itself. CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
>>>> can also be helpful. In post-2.6.39 mainline, it should be possible
>>>> to set CONFIG_DEBUG_OBJECTS_RCU_HEAD=y without CONFIG_PREEMPT=y, but
>>>> again, CONFIG_PREEMPT=y can help find problems.
>>>
>>> 0) The first thing I tried (from your suggestions) was
>>> CONFIG_DEBUG_OBJECTS_RCU_HEAD=y. Given its dependencies (and, well, the
>>> build system I used) I ended up with:
>>>
>>> $ grep -e PREEMPT -e RCU /boot/config-2.6.39-0.local3.fc16.i686 |
>>> grep -v "^#"
>>> CONFIG_TREE_PREEMPT_RCU=y
>>> CONFIG_PREEMPT_RCU=y
>>> CONFIG_RCU_FANOUT=32
>>> CONFIG_PREEMPT_NOTIFIERS=y
>>> CONFIG_PREEMPT=y
>>> CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
>>> CONFIG_DEBUG_PREEMPT=y
>>> CONFIG_PROVE_RCU=y
>>> CONFIG_SPARSE_RCU_POINTER=y
>>>
>>> It looks like I am unable to trigger the issue we're talking about here
>>> when using that config.
>>>
>>> 1) For reference, the config of a kernel that does trigger it had:
>>>
>>> $ grep -e PREEMPT -e RCU /boot/config-2.6.39-0.local2.fc16.i686 |
>>> grep -v "^#"
>>> CONFIG_TREE_RCU=y
>>> CONFIG_RCU_FANOUT=32
>>> CONFIG_RCU_FAST_NO_HZ=y
>>> CONFIG_PREEMPT_NOTIFIERS=y
>>> CONFIG_PREEMPT_VOLUNTARY=y
>>> CONFIG_PROVE_RCU=y
>>> CONFIG_SPARSE_RCU_POINTER=y
>>>
>>>>> Again CONFIG_TREE_PREEMPT_RCU is available only if PREEMPT=y. So should
>>>>> we enable preemtion and CONFIG_TREE_PREEMPT_RCU=y and try to reproduce
>>>>> the issue?
>>>>
>>>> Please!
>>>
>>> 2) It appears I can't reproduce with those options enabled (see above).
>>>
>>>> Polling is fine. Please see attached for a script to poll at 15-second
>>>> intervals. Please also feel free to adjust, just tell me what you
>>>> adjusted.
>>>
>>> And should I now try to run that script on a config that triggers this
>>> issue (such as the config under 1) above)?
>>
>> Paul, can we see a dmesg from your running system? Perhaps there's some
>> dependency on a particular driver or device that makes this easier to
>> reproduce.
>
> Here you go, please see attached.
>
> I should have some additional diagnostics later today Pacific time.

Heh sorry, _other_ Paul :-)

You are not seeing this issue, are you?

As per your earlier comment on sleeping under rcu_read_lock(), I checked
everything again and it seems sane. Would that not trigger an
immediately schedule-while-atomic in any case, regardless of RCU config?

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/