Re: [PATCH] rcu: Make jiffies_till_sched_qs writable
From: Paul E. McKenney
Date: Sat Jul 13 2019 - 13:42:36 EST
On Sat, Jul 13, 2019 at 11:42:57AM -0400, Joel Fernandes wrote:
> On Sat, Jul 13, 2019 at 08:13:30AM -0700, Paul E. McKenney wrote:
> > On Sat, Jul 13, 2019 at 10:20:02AM -0400, Joel Fernandes wrote:
> > > On Sat, Jul 13, 2019 at 4:47 AM Byungchul Park
> > > <max.byungchul.park@xxxxxxxxx> wrote:
> > > >
> > > > On Fri, Jul 12, 2019 at 9:51 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
> > > > >
> > > > > On Fri, Jul 12, 2019 at 03:32:40PM +0900, Byungchul Park wrote:
> > > > > > On Thu, Jul 11, 2019 at 03:58:39PM -0400, Joel Fernandes wrote:
> > > > > > > Hmm, speaking of grace period durations, it seems to me the maximum grace
> > > > > > > period ever is recorded in rcu_state.gp_max. However it is not read from
> > > > > > > anywhere.
> > > > > > >
> > > > > > > Any idea why it was added but not used?
> > > > > > >
> > > > > > > I am interested in dumping this value just for fun, and seeing what I get.
> > > > > > >
> > > > > > > I wonder also it is useful to dump it in rcutorture/rcuperf to find any
> > > > > > > issues, or even expose it in sys/proc fs to see what worst case grace periods
> > > > > > > look like.
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > commit ae91aa0adb14dc33114d566feca2f7cb7a96b8b7
> > > > > > rcu: Remove debugfs tracing
> > > > > >
> > > > > > removed all debugfs tracing, gp_max also included.
> > > > > >
> > > > > > And you sounds great. And even looks not that hard to add it like,
> > > > > >
> > > > > > :)
> > > > > >
> > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > index ad9dc86..86095ff 100644
> > > > > > --- a/kernel/rcu/tree.c
> > > > > > +++ b/kernel/rcu/tree.c
> > > > > > @@ -1658,8 +1658,10 @@ static void rcu_gp_cleanup(void)
> > > > > > raw_spin_lock_irq_rcu_node(rnp);
> > > > > > rcu_state.gp_end = jiffies;
> > > > > > gp_duration = rcu_state.gp_end - rcu_state.gp_start;
> > > > > > - if (gp_duration > rcu_state.gp_max)
> > > > > > + if (gp_duration > rcu_state.gp_max) {
> > > > > > rcu_state.gp_max = gp_duration;
> > > > > > + trace_rcu_grace_period(something something);
> > > > > > + }
> > > > >
> > > > > Yes, that makes sense. But I think it is much better off as a readable value
> > > > > from a virtual fs. The drawback of tracing for this sort of thing are:
> > > > > - Tracing will only catch it if tracing is on
> > > > > - Tracing data can be lost if too many events, then no one has a clue what
> > > > > the max gp time is.
> > > > > - The data is already available in rcu_state::gp_max so copying it into the
> > > > > trace buffer seems a bit pointless IMHO
> > > > > - It is a lot easier on ones eyes to process a single counter than process
> > > > > heaps of traces.
> > > > >
> > > > > I think a minimal set of RCU counters exposed to /proc or /sys should not
> > > > > hurt and could do more good than not. The scheduler already does this for
> > > > > scheduler statistics. I have seen Peter complain a lot about new tracepoints
> > > > > but not much (or never) about new statistics.
> > > > >
> > > > > Tracing has its strengths but may not apply well here IMO. I think a counter
> > > > > like this could be useful for tuning of things like the jiffies_*_sched_qs,
> > > > > the stall timeouts and also any other RCU knobs. What do you think?
> > > >
> > > > I prefer proc/sys knob for it to tracepoint. Why I've considered it is just it
> > > > looks like undoing what Paul did at ae91aa0ad.
> > > >
> > > > I think you're rational enough. I just wondered how Paul think of it.
> > >
> > > I believe at least initially, a set of statistics can be made
> > > available only when rcutorture or rcuperf module is loaded. That way
> > > they are purely only for debugging and nothing needs to be exposed to
> > > normal kernels distributed thus reducing testability concerns.
> > >
> > > rcu_state::gp_max would be trivial to expose through this, but for
> > > other statistics that are more complicated - perhaps
> > > tracepoint_probe_register can be used to add hooks on to the
> > > tracepoints and generate statistics from them. Again the registration
> > > of the probe and the probe handler itself would all be in
> > > rcutorture/rcuperf test code and not a part of the kernel proper.
> > > Thoughts?
> >
> > It still feels like you guys are hyperfocusing on this one particular
> > knob. I instead need you to look at the interrelating knobs as a group.
>
> Thanks for the hints, we'll do that.
>
> > On the debugging side, suppose someone gives you an RCU bug report.
> > What information will you need? How can you best get that information
> > without excessive numbers of over-and-back interactions with the guy
> > reporting the bug? As part of this last question, what information is
> > normally supplied with the bug? Alternatively, what information are
> > bug reporters normally expected to provide when asked?
>
> I suppose I could dig out some of our Android bug reports of the past where
> there were RCU issues but if there's any fires you are currently fighting do
> send it our way as debugging homework ;-)
Evading the questions, are we?
OK, I can be flexible. Suppose that you were getting RCU CPU stall
warnings featuring multi_cpu_stop() called from cpu_stopper_thread().
Of course, this really means that some other CPU/task is holding up
multi_cpu_stop() without also blocking the current grace period.
What is the best way to work out what is really holding things up?
Thanx, Paul