Re: RCU stall when using function_graph

From: Daniel Lezcano
Date: Thu Aug 03 2017 - 07:41:23 EST


On 02/08/2017 15:07, Steven Rostedt wrote:
> On Wed, 2 Aug 2017 14:42:39 +0200
> Daniel Lezcano <daniel.lezcano@xxxxxxxxxx> wrote:
>
>> On Tue, Aug 01, 2017 at 08:12:14PM -0400, Steven Rostedt wrote:
>>> On Wed, 2 Aug 2017 00:15:44 +0200
>>> Daniel Lezcano <daniel.lezcano@xxxxxxxxxx> wrote:
>>>
>>>> On 02/08/2017 00:04, Paul E. McKenney wrote:
>>>>>> Hi Paul,
>>>>>>
>>>>>> I have been trying to set the function_graph tracer for ftrace and each time I
>>>>>> get a CPU stall.
>>>>>>
>>>>>> How to reproduce:
>>>>>> -----------------
>>>>>>
>>>>>> echo function_graph > /sys/kernel/debug/tracing/current_tracer
>>>>>>
>>>>>> This error appears with v4.13-rc3 and v4.12-rc6.
>>>
>>> Can you bisect this? It may be due to this commit:
>>>
>>> 0598e4f08 ("ftrace: Add use of synchronize_rcu_tasks() with dynamic trampolines")
>>
>> Hi Steve,
>>
>> I git bisected but each time the issue occured. I went through the different
>> version down to v4.4 where the board was not fully supported and it ended up to
>> have the same issue.
>>
>> Finally, I had the intuition it could be related to the wall time (there is no
>> RTC clock with battery on the board and the wall time is Jan 1st, 1970).
>>
>> Setting up the with ntpdate solved the problem.

Actually, it did not solve the problem. The function_graph trace is set,
I can use the system but after awhile (no tracing enabled at anytime),
the stall appears.

>> Even if it is rarely the case to have the time not set, is it normal to have a
>> RCU cpu stall ?
>>
>>
>
> BTW, function_graph tracer is the most invasive of the tracers. It's 4x
> slower than function tracer. I'm wondering if the tracer isn't the
> cause, but just slows things down enough to cause a some other race
> condition that triggers the bug.

Yes, that could be true.

I tried the following scenario:

- cpufreq governor => userspace + max_freq (1.2GHz)
- function_graph set ==> OK

- cpufreq governor => userspace + min_freq (200MHz)
- function_graph set ==> RCU stall

Beside that, I realize the board is constantly processing SOF interrupts
every 124us, so that adds more overhead.

Removing the USB support, thus the associated processing for the SOF
interrupts, I don't see anymore the RCU stall.

Is it the expected behavior to have the system hang after a RCU stall
raises ?

--
<http://www.linaro.org/> Linaro.org â Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog