Re: TREE_SRCU slows hotplug by factor ~16

From: Paul E. McKenney
Date: Wed Apr 26 2017 - 11:44:57 EST


On Wed, Apr 26, 2017 at 05:26:20PM +0200, Mike Galbraith wrote:
> On Wed, 2017-04-26 at 07:31 -0700, Paul E. McKenney wrote:
>
> > And a sneak preview, semi-tested. If you get a chance to run this, please
> > let me know now it goes.
>
> That took 'time stress-cpu-hotplug.sh' down to 48s, close to classic.

Woo-hoo!!! ;-)

And thank you for your testing efforts!

Should I be comparing this with the 55s number from your initial email,
or to the 39s number?

Either way, given the unusual nature of Steven's hotplug stress test,
I believe that I am good enough for this merge window. But if we
are talking 48s for Tree SRCU vs. 39s with Classic SRCU, it would be
good to at least understand where the remaining slowdown is. Here
are a couple of possible causes:

o My holdoff is too long. I set it to 50 microseconds based
on your trace, which shows a minimum grace-period separation
of 118 microseconds. But perhaps the trace was too short to
show the full variation. One way to check this is to run with
srcutree.exp_holdoff=25000 or some such. (Please note that
srcutree.exp_holdoff is in nanoseconds, -not- microseconds.)

o My expedited throttling is too aggressive. This is controlled
by the following lines of code in srcu_gp_end() in the file
kernel/rcu/srcutree.c:

/* Throttle expedited grace periods: Should be rare! */
srcu_reschedule(sp, rcu_seq_ctr(gpseq) & 0x3ff
? 0 : SRCU_INTERVAL);

The "0x3ff" says that one in 1024 grace periods should be
forced to be at least partially non-expedited, regardless
of anything else. If making this be (say) "0xfff" gets
you three-quarters of the way to the 39s, that indicates
that this is the controlling factor.

o Of course, another question is how much variation there is
in the timing of that stress test.

If further reduction is needed, and none of these help, could you
please send me a trace of the full run of the same form as the last
one you sent me, covering calls to and returns from call_srcu(),
synchronize_srcu(), and synchronize_srcu_expedited()?

Thanx, paul