Re: RCU stall when using function_graph

From: Paul E. McKenney
Date: Thu Aug 10 2017 - 17:39:50 EST


On Thu, Aug 10, 2017 at 11:45:09AM +0200, Daniel Lezcano wrote:
> On 09/08/2017 19:22, Paul E. McKenney wrote:
> > On Wed, Aug 09, 2017 at 05:51:33PM +0200, Daniel Lezcano wrote:
> >> On 09/08/2017 16:40, Paul E. McKenney wrote:
> >>> On Wed, Aug 09, 2017 at 03:28:05PM +0200, Daniel Lezcano wrote:
> >>>> On 09/08/2017 14:58, Paul E. McKenney wrote:
> >>>>> On Wed, Aug 09, 2017 at 02:43:49PM +0530, Pratyush Anand wrote:
> >>>>>>
> >>>>>>
> >>>>>> On Sunday 06 August 2017 10:32 PM, Paul E. McKenney wrote:
> >>>>>>> On Sat, Aug 05, 2017 at 02:24:21PM +0900, êëí wrote:
> >>>>>>>> Dear All
> >>>>>>>>
> >>>>>>>> As for me, after configuring function_graph as below, crash disappears.
> >>>>>>>> "echo 0 > d/tracing/tracing_on"
> >>>>>>>> "sleep 1"
> >>>>>>>>
> >>>>>>>> "echo function_graph > d/tracing/current_tracer"
> >>>>>>>> "sleep 1"
> >>>>>>>>
> >>>>>>>> "echo smp_call_function_single > d/tracing/set_ftrace_filter"
> >>>>>>
> >>>>>> It will limit trace output to only for the filtered function
> >>>>>> (smp_call_function_single).
> >>>>>>
> >>>>>>>> adb shell "sleep 1"
> >>>>>>>>
> >>>>>>>> "echo 1 > d/tracing/tracing_on"
> >>>>>>>> adb shell "sleep 1"
> >>>>>>>>
> >>>>>>>> Right after function_graph is enabled, too many logs are traced upon IRQ
> >>>>>>>> transaction which many times eventually causes stall.
> >>>>>>>
> >>>>>>> That would do it!
> >>>>>>>
> >>>>>>> Hmmm...
> >>>>>>>
> >>>>>>> Steven, would it be helpful if RCU were to inform tracing (say) halfway
> >>>>>>> through the RCU CPU stall interval, allowing the tracer to do something
> >>>>>>> like cond_resched_rcu_qs()? I can imagine all sorts of reasons why this
> >>>>>>> wouldn't work, for example, if all the tracing was with irqs disabled
> >>>>>>> or some such, but figured I should ask.
> >>>>>>>
> >>>>>>> Does Guillermo's approach work for others?
> >>>>>>
> >>>>>> Limited output with a couple of filtered function will definitely
> >>>>>> not cause RCU schedule stall. But the question is whether we should
> >>>>>> expect a full function graph trace working on every platform or not
> >>>>>> (specially the one which generates high interrupts)?
> >>>>>
> >>>>> It might well be that the user must disable RCU CPU stall warnings via
> >>>>> the rcu_cpu_stall_suppress sysfs entry (or increase their timeout via th
> >>>>> rcu_cpu_stall_timeout sysfs entry) before doing something that greatly
> >>>>> increases overhead. Like enabling large quantities of tracing. ;-)
> >>>>>
> >>>>> It -might- be possible to do this automatically, but reliable
> >>>>> automation would require that tracing understand how often each
> >>>>> function was called, which sounds to me to be a bit of a stretch.
> >>>>>
> >>>>> Thoughts?
> >>>>
> >>>> A random thought:
> >>>>
> >>>> Is it possible to have a mid-timeout happening and store some
> >>>> information like the instruction pointer, so when the timeout happen we
> >>>> can compare if there was some progress, if yes, very likely, system
> >>>> performance collapsed and we are not fast enough.
> >>>
> >>> RCU already does take various actions for an impending stall, so something
> >>> could be done. But in most slowdowns, the instruction pointer will be
> >>> changing rapidly, just not as rapidly as it would normally. So exactly
> >>> how would the forward-progress comparison be carried out?
> >>>
> >>> It would be easy to set up a notifier, so that if any notifier in the
> >>> chain returned an error, stall warnings would be suppressed. It would
> >>> be harder to figure out when to re-enable them, though I suppose that
> >>> they could be suppressed only for the duration of the current grace
> >>> period or some such.
> >>>
> >>> But what exactly would you use such a notifier for?
> >>>
> >>> Or am I misunderstanding your suggestion?
> >>
> >> Well, may be the instruction pointer thing is not a good idea.
> >>
> >> I learnt from this experience, an overloaded kernel with a lot of
> >> interrupts can hang the console and issue RCU stall.
> >>
> >> However, someone else can face the same situation. Even if he reads the
> >> RCU/stallwarn.txt documentation, it will be hard to figure out the issue.
> >>
> >> A message telling the grace period can't be reached because we are too
> >> busy processing interrupts would have helped but I understand it is not
> >> easy to implement.
> >>
> >> Perhaps, adding a new bullet in the documentation can help:
> >>
> >> "If the interrupt processing time is longer than the interval between
> >> each interrupt, the CPU will keep processing the interrupts without
> >> allowing the RCU's grace period kthread. This situation can happen if
> >> there is a highly rated number of interrupts and the function_graph
> >> tracer is enabled".
> >
> > How about this?
>
> Yes, it is clear. Thanks for reformulating it.
>
> > Any other debugging options that should be called out? I bet that
> > the function_graph tracer isn't the only way to make this happen.
>
> Nothing coming in mind but may be worth to mention the slowness of the
> CPU is the aggravating factor. In particular I was able to reproduce the
> issue by setting to the min CPU frequency. With the ondemand governor,
> we can have the frequency high (hence enough CPU power) at the moment we
> set the function_graph because another CPU is loaded (and both CPUs are
> sharing the same clock line). The system became stuck at the moment the
> other CPU went idle with the lowest frequency. That introduced
> randomness in the issue and made hard to figure out why the RCU stall
> was happening.

Adding this, then?

Thanx, Paul

------------------------------------------------------------------------

commit f7d9ce95064f76be583c775fac32076fa59f1617
Author: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
Date: Thu Aug 10 14:33:17 2017 -0700

documentation: Slow systems can stall RCU grace periods

If a fast system has a worst-case grace-period duration of (say) ten
seconds, then running the same workload on a system ten times as slow
will get you an RCU CPU stall warning given default stall-warning
timeout settings. This commit therefore adds this possibility to
stallwarn.txt.

Reported-by: Daniel Lezcano <daniel.lezcano@xxxxxxxxxx>
Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>

diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt
index 21b8913acbdf..238acbd94917 100644
--- a/Documentation/RCU/stallwarn.txt
+++ b/Documentation/RCU/stallwarn.txt
@@ -70,6 +70,12 @@ o A periodic interrupt whose handler takes longer than the time
considerably longer than normal, which can in turn result in
RCU CPU stall warnings.

+o Testing a workload on a fast system, tuning the stall-warning
+ timeout down to just barely avoid RCU CPU stall warnings, and then
+ running the same workload with the same stall-warning timeout on a
+ slow system. Note that thermal throttling and on-demand governors
+ can cause a single system to be sometimes fast and sometimes slow!
+
o A hardware or software issue shuts off the scheduler-clock
interrupt on a CPU that is not in dyntick-idle mode. This
problem really has happened, and seems to be most likely to