Re: [rcu] c0f4dfd4f9: -65% softirqs.RCU
From: Fengguang Wu
Date: Mon Jan 27 2014 - 21:59:25 EST
On Mon, Jan 27, 2014 at 09:06:02AM -0800, Paul E. McKenney wrote:
> On Fri, Jan 24, 2014 at 07:11:30PM +0800, Fengguang Wu wrote:
> > On Mon, Jan 20, 2014 at 08:41:00PM -0800, Paul E. McKenney wrote:
> > > On Mon, Jan 20, 2014 at 08:29:12PM +0800, Fengguang Wu wrote:
> > > > On Sun, Jan 19, 2014 at 03:11:14PM -0800, Paul E. McKenney wrote:
> > > > > On Sun, Jan 19, 2014 at 08:16:08PM +0800, Fengguang Wu wrote:
> > > > > > Hi Paul,
> > > > > >
> > > > > > Just FYI, we noticed the following changes (which looks good) on old commit
> > > > > > c0f4dfd4f9 ("rcu: Make RCU_FAST_NO_HZ take advantage of numbered callbacks")
> > > > > > in test case dd-write/4HDD-JBOD-cfq-btrfs-1dd:
> > > > > >
> > > > > > b11cc5 (parent) c0f4dfd4f90f1667d234d21f1
> > > > > > --------------- -------------------------
> > > > > > 213757 ~ 4% -65.4% 73929 ~ 3% softirqs.RCU
> > > > > > 21193 ~ 5% -36.5% 13451 ~ 4% softirqs.SCHED
> > > > > > 2036 ~ 4% -59.4% 825 ~ 3% vmstat.system.cs
> > > > > > 1304520 ~ 4% -59.2% 532451 ~ 3% perf-stat.context-switches
> > > > > > 95685 ~ 4% -44.0% 53598 ~ 2% perf-stat.cpu-migrations
> > > > >
> > > > > Glad it helped! IIRC, this same commit increased latencies due to
> > > > > synchronize_rcu() latency increasing. So this is the good side of
> > > > > that other not-so-good result. ;-)
> > > >
> > > > If you care it and there is a low cost way for user space to get that
> > > > synchronize_rcu() latency, I'd be eager to collect it in my tests. :)
> > >
> > > Would a kernel module that measured the latency be OK, or do you need
> > > some system call that is exposed to synchronize_rcu() latency?
> >
> > Kernel module should be good enough for me. Perhaps something like
> > kernel/latencytop.c?
>
> So you are looking for something that measures synchronize_rcu() latency
> for the synchronize_rcu() calls that occur naturally in the kernel, rather
> than having a focused microbenchmark?
Yes, then I can measure the synchronize_rcu() latency in all the tests
I run, including the possible focused microbenchmarks on RCU. :)
btw, I've measured the overheads of CONFIG_SCHEDSTATS which is
required for running latencytop, and it seems acceptable:
x86_64-lkp x86_64-lkp+CONFIG_SCHEDST
--------------- -------------------------
174190 ~ 0% -4.1% 167062 ~ 0% lkp-snb01/micro/hackbench/1600%-threads-pipe
158995 ~ 1% -3.1% 154094 ~ 0% lkp-snb01/micro/hackbench/1600%-threads-socket
333186 ~ 1% -3.6% 321156 ~ 0% TOTAL hackbench.throughput
x86_64-lkp x86_64-lkp+CONFIG_SCHEDST
--------------- -------------------------
278 ~ 0% -3.4% 269 ~ 0% lkp-a04/micro/netperf/120s-200%-TCP_MAERTS
632 ~ 1% -2.9% 613 ~ 1% lkp-a04/micro/netperf/120s-200%-TCP_SENDFILE
280 ~ 1% -3.7% 270 ~ 0% lkp-a04/micro/netperf/120s-200%-TCP_STREAM
1191 ~ 1% -3.2% 1153 ~ 1% TOTAL netperf.Throughput_Mbps
x86_64-lkp x86_64-lkp+CONFIG_SCHEDST
--------------- -------------------------
386 ~ 0% -2.1% 378 ~ 0% lkp-a04/micro/netperf/120s-200%-TCP_CRR
2057 ~ 0% -3.6% 1982 ~ 0% lkp-a04/micro/netperf/120s-200%-TCP_RR
2518 ~ 0% -1.4% 2482 ~ 0% lkp-a04/micro/netperf/120s-200%-UDP_RR
4962 ~ 0% -2.4% 4843 ~ 0% TOTAL netperf.Throughput_tps
x86_64-lkp x86_64-lkp+CONFIG_SCHEDST
--------------- -------------------------
37316711 ~ 0% -0.9% 36976450 ~ 0% nhm-white/sysbench/oltp/600s-100%-1000000
37316711 ~ 0% -0.9% 36976450 ~ 0% TOTAL oltp.rw_requets
x86_64-lkp x86_64-lkp+CONFIG_SCHEDST
--------------- -------------------------
2665479 ~ 0% -0.9% 2641175 ~ 0% nhm-white/sysbench/oltp/600s-100%-1000000
2665479 ~ 0% -0.9% 2641175 ~ 0% TOTAL oltp.transactions
x86_64-lkp x86_64-lkp+CONFIG_SCHEDST
--------------- -------------------------
68.50 ~ 0% -0.2% 68.39 ~ 0% xps2/micro/pigz/100%
68.50 ~ 0% -0.2% 68.39 ~ 0% TOTAL pigz.throughput
Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/