Re: [LKP] [RAID5] 878ee679279: -1.8% vmstat.io.bo, +40.5% perf-stat.LLC-load-misses

From: Yuanhan Liu
Date: Thu Apr 30 2015 - 02:24:04 EST


On Fri, Apr 24, 2015 at 12:15:59PM +1000, NeilBrown wrote:
> On Thu, 23 Apr 2015 14:55:59 +0800 Huang Ying <ying.huang@xxxxxxxxx> wrote:
>
> > FYI, we noticed the below changes on
> >
> > git://neil.brown.name/md for-next
> > commit 878ee6792799e2f88bdcac329845efadb205252f ("RAID5: batch adjacent full stripe write")
>
> Hi,
> is there any chance that you could explain what some of this means?
> There is lots of data and some very pretty graphs, but no explanation.

Hi Neil,

(Sorry for late response: Ying is on vacation)

I guess you can simply ignore this report, as I already reported to you
month ago that this patch made fsmark performs better in most cases:

https://lists.01.org/pipermail/lkp/2015-March/002411.html

>
> Which numbers are "good", which are "bad"? Which is "worst".
> What do the graphs really show? and what would we like to see in them?
>
> I think it is really great that you are doing this testing and reporting the
> results. It's just so sad that I completely fail to understand them.

Sorry, it's our bad to make them hard to understand as well as
to report a duplicate one(well, the commit hash is different ;).

We might need take some time to make those data understood easier.

--yliu

>
> >
> >
> > testbox/testcase/testparams: lkp-st02/dd-write/300-5m-11HDD-RAID5-cfq-xfs-1dd
> >
> > a87d7f782b47e030 878ee6792799e2f88bdcac3298
> > ---------------- --------------------------
> > %stddev %change %stddev
> > \ | \
> > 59035 ± 0% +18.4% 69913 ± 1% softirqs.SCHED
> > 1330 ± 10% +17.4% 1561 ± 4% slabinfo.kmalloc-512.num_objs
> > 1330 ± 10% +17.4% 1561 ± 4% slabinfo.kmalloc-512.active_objs
> > 305908 ± 0% -1.8% 300427 ± 0% vmstat.io.bo
> > 1 ± 0% +100.0% 2 ± 0% vmstat.procs.r
> > 8266 ± 1% -15.7% 6968 ± 0% vmstat.system.cs
> > 14819 ± 0% -2.1% 14503 ± 0% vmstat.system.in
> > 18.20 ± 6% +10.2% 20.05 ± 4% perf-profile.cpu-cycles.raid_run_ops.handle_stripe.handle_active_stripes.raid5d.md_thread
> > 1.94 ± 9% +90.6% 3.70 ± 9% perf-profile.cpu-cycles.async_xor.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
> > 0.00 ± 0% +Inf% 25.18 ± 3% perf-profile.cpu-cycles.handle_active_stripes.isra.45.raid5d.md_thread.kthread.ret_from_fork
> > 0.00 ± 0% +Inf% 14.14 ± 4% perf-profile.cpu-cycles.async_copy_data.isra.42.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
> > 1.79 ± 7% +102.9% 3.64 ± 9% perf-profile.cpu-cycles.xor_blocks.async_xor.raid_run_ops.handle_stripe.handle_active_stripes
> > 3.09 ± 4% -10.8% 2.76 ± 4% perf-profile.cpu-cycles.get_active_stripe.make_request.md_make_request.generic_make_request.submit_bio
> > 0.80 ± 14% +28.1% 1.02 ± 10% perf-profile.cpu-cycles.mutex_lock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
> > 14.78 ± 6% -100.0% 0.00 ± 0% perf-profile.cpu-cycles.async_copy_data.isra.38.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
> > 25.68 ± 4% -100.0% 0.00 ± 0% perf-profile.cpu-cycles.handle_active_stripes.isra.41.raid5d.md_thread.kthread.ret_from_fork
> > 1.23 ± 5% +140.0% 2.96 ± 7% perf-profile.cpu-cycles.xor_sse_5_pf64.xor_blocks.async_xor.raid_run_ops.handle_stripe
> > 2.62 ± 6% -95.6% 0.12 ± 33% perf-profile.cpu-cycles.analyse_stripe.handle_stripe.handle_active_stripes.raid5d.md_thread
> > 0.96 ± 9% +17.5% 1.12 ± 2% perf-profile.cpu-cycles.xfs_ilock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
> > 1.461e+10 ± 0% -5.3% 1.384e+10 ± 1% perf-stat.L1-dcache-load-misses
> > 3.688e+11 ± 0% -2.7% 3.59e+11 ± 0% perf-stat.L1-dcache-loads
> > 1.124e+09 ± 0% -27.7% 8.125e+08 ± 0% perf-stat.L1-dcache-prefetches
> > 2.767e+10 ± 0% -1.8% 2.717e+10 ± 0% perf-stat.L1-dcache-store-misses
> > 2.352e+11 ± 0% -2.8% 2.287e+11 ± 0% perf-stat.L1-dcache-stores
> > 6.774e+09 ± 0% -2.3% 6.62e+09 ± 0% perf-stat.L1-icache-load-misses
> > 5.571e+08 ± 0% +40.5% 7.826e+08 ± 1% perf-stat.LLC-load-misses
> > 6.263e+09 ± 0% -13.7% 5.407e+09 ± 1% perf-stat.LLC-loads
> > 1.914e+11 ± 0% -4.2% 1.833e+11 ± 0% perf-stat.branch-instructions
> > 1.145e+09 ± 2% -5.6% 1.081e+09 ± 0% perf-stat.branch-load-misses
> > 1.911e+11 ± 0% -4.3% 1.829e+11 ± 0% perf-stat.branch-loads
> > 1.142e+09 ± 2% -5.1% 1.083e+09 ± 0% perf-stat.branch-misses
> > 1.218e+09 ± 0% +19.8% 1.46e+09 ± 0% perf-stat.cache-misses
> > 2.118e+10 ± 0% -5.2% 2.007e+10 ± 0% perf-stat.cache-references
> > 2510308 ± 1% -15.7% 2115410 ± 0% perf-stat.context-switches
> > 39623 ± 0% +22.1% 48370 ± 1% perf-stat.cpu-migrations
> > 4.179e+08 ± 40% +165.7% 1.111e+09 ± 35% perf-stat.dTLB-load-misses
> > 3.684e+11 ± 0% -2.5% 3.592e+11 ± 0% perf-stat.dTLB-loads
> > 1.232e+08 ± 15% +62.5% 2.002e+08 ± 27% perf-stat.dTLB-store-misses
> > 2.348e+11 ± 0% -2.5% 2.288e+11 ± 0% perf-stat.dTLB-stores
> > 3577297 ± 2% +8.7% 3888986 ± 1% perf-stat.iTLB-load-misses
> > 1.035e+12 ± 0% -3.5% 9.988e+11 ± 0% perf-stat.iTLB-loads
> > 1.036e+12 ± 0% -3.7% 9.978e+11 ± 0% perf-stat.instructions
> > 594 ± 30% +130.3% 1369 ± 13% sched_debug.cfs_rq[0]:/.blocked_load_avg
> > 17 ± 10% -28.2% 12 ± 23% sched_debug.cfs_rq[0]:/.nr_spread_over
> > 210 ± 21% +42.1% 298 ± 28% sched_debug.cfs_rq[0]:/.tg_runnable_contrib
> > 9676 ± 21% +42.1% 13754 ± 28% sched_debug.cfs_rq[0]:/.avg->runnable_avg_sum
> > 772 ± 25% +116.5% 1672 ± 9% sched_debug.cfs_rq[0]:/.tg_load_contrib
> > 8402 ± 9% +83.3% 15405 ± 11% sched_debug.cfs_rq[0]:/.tg_load_avg
> > 8356 ± 9% +82.8% 15272 ± 11% sched_debug.cfs_rq[1]:/.tg_load_avg
> > 968 ± 25% +100.8% 1943 ± 14% sched_debug.cfs_rq[1]:/.blocked_load_avg
> > 16242 ± 9% -22.2% 12643 ± 14% sched_debug.cfs_rq[1]:/.avg->runnable_avg_sum
> > 353 ± 9% -22.1% 275 ± 14% sched_debug.cfs_rq[1]:/.tg_runnable_contrib
> > 1183 ± 23% +77.7% 2102 ± 12% sched_debug.cfs_rq[1]:/.tg_load_contrib
> > 181 ± 8% -31.4% 124 ± 26% sched_debug.cfs_rq[2]:/.tg_runnable_contrib
> > 8364 ± 8% -31.3% 5745 ± 26% sched_debug.cfs_rq[2]:/.avg->runnable_avg_sum
> > 8297 ± 9% +81.7% 15079 ± 12% sched_debug.cfs_rq[2]:/.tg_load_avg
> > 30439 ± 13% -45.2% 16681 ± 26% sched_debug.cfs_rq[2]:/.exec_clock
> > 39735 ± 14% -48.3% 20545 ± 29% sched_debug.cfs_rq[2]:/.min_vruntime
> > 8231 ± 10% +82.2% 15000 ± 12% sched_debug.cfs_rq[3]:/.tg_load_avg
> > 1210 ± 14% +110.3% 2546 ± 30% sched_debug.cfs_rq[4]:/.tg_load_contrib
> > 8188 ± 10% +82.8% 14964 ± 12% sched_debug.cfs_rq[4]:/.tg_load_avg
> > 8132 ± 10% +83.1% 14890 ± 12% sched_debug.cfs_rq[5]:/.tg_load_avg
> > 749 ± 29% +205.9% 2292 ± 34% sched_debug.cfs_rq[5]:/.blocked_load_avg
> > 963 ± 30% +169.9% 2599 ± 33% sched_debug.cfs_rq[5]:/.tg_load_contrib
> > 37791 ± 32% -38.6% 23209 ± 13% sched_debug.cfs_rq[6]:/.min_vruntime
> > 693 ± 25% +132.2% 1609 ± 29% sched_debug.cfs_rq[6]:/.blocked_load_avg
> > 10838 ± 13% -39.2% 6587 ± 13% sched_debug.cfs_rq[6]:/.avg->runnable_avg_sum
> > 29329 ± 27% -33.2% 19577 ± 10% sched_debug.cfs_rq[6]:/.exec_clock
> > 235 ± 14% -39.7% 142 ± 14% sched_debug.cfs_rq[6]:/.tg_runnable_contrib
> > 8085 ± 10% +83.6% 14848 ± 12% sched_debug.cfs_rq[6]:/.tg_load_avg
> > 839 ± 25% +128.5% 1917 ± 18% sched_debug.cfs_rq[6]:/.tg_load_contrib
> > 8051 ± 10% +83.6% 14779 ± 12% sched_debug.cfs_rq[7]:/.tg_load_avg
> > 156 ± 34% +97.9% 309 ± 19% sched_debug.cpu#0.cpu_load[4]
> > 160 ± 25% +64.0% 263 ± 16% sched_debug.cpu#0.cpu_load[2]
> > 156 ± 32% +83.7% 286 ± 17% sched_debug.cpu#0.cpu_load[3]
> > 164 ± 20% -35.1% 106 ± 31% sched_debug.cpu#2.cpu_load[0]
> > 249 ± 15% +80.2% 449 ± 10% sched_debug.cpu#4.cpu_load[3]
> > 231 ± 11% +101.2% 466 ± 13% sched_debug.cpu#4.cpu_load[2]
> > 217 ± 14% +189.9% 630 ± 38% sched_debug.cpu#4.cpu_load[0]
> > 71951 ± 5% +21.6% 87526 ± 7% sched_debug.cpu#4.nr_load_updates
> > 214 ± 8% +146.1% 527 ± 27% sched_debug.cpu#4.cpu_load[1]
> > 256 ± 17% +75.7% 449 ± 13% sched_debug.cpu#4.cpu_load[4]
> > 209 ± 23% +98.3% 416 ± 48% sched_debug.cpu#5.cpu_load[2]
> > 68024 ± 2% +18.8% 80825 ± 1% sched_debug.cpu#5.nr_load_updates
> > 217 ± 26% +74.9% 380 ± 45% sched_debug.cpu#5.cpu_load[3]
> > 852 ± 21% -38.3% 526 ± 22% sched_debug.cpu#6.curr->pid
> >
> > lkp-st02: Core2
> > Memory: 8G
> >
> >
> >
> >
> > perf-stat.cache-misses
> >
> > 1.6e+09 O+-----O--O---O--O---O--------------------------------------------+
> > | O O O O O O O O O O |
> > 1.4e+09 ++ |
> > 1.2e+09 *+.*...* *..* * *...*..*...*..*...*..*...*..*...*..*
> > | : : : : : |
> > 1e+09 ++ : : : : : : |
> > | : : : : : : |
> > 8e+08 ++ : : : : : : |
> > | : : : : : : |
> > 6e+08 ++ : : : : : : |
> > 4e+08 ++ : : : : : : |
> > | : : : : : : |
> > 2e+08 ++ : : : : : : |
> > | : : : |
> > 0 ++-O------*----------*------*-------------------------------------+
> >
> >
> > perf-stat.L1-dcache-prefetches
> >
> > 1.2e+09 ++----------------------------------------------------------------+
> > *..*...* *..* * ..*.. ..*..*...*..*...*..*...*..*
> > 1e+09 ++ : : : : *. *. |
> > | : : : :: : |
> > | : : : : : : O |
> > 8e+08 O+ O: O :O O: O :O: O :O O O O O O O |
> > | : : : : : : |
> > 6e+08 ++ : : : : : : |
> > | : : : : : : |
> > 4e+08 ++ : : : : : : |
> > | : : : : : : |
> > | : : : : : : |
> > 2e+08 ++ :: :: : : |
> > | : : : |
> > 0 ++-O------*----------*------*-------------------------------------+
> >
> >
> > perf-stat.LLC-load-misses
> >
> > 1e+09 ++------------------------------------------------------------------+
> > 9e+08 O+ O O O O O |
> > | O O O O |
> > 8e+08 ++ O O O O O O |
> > 7e+08 ++ |
> > | |
> > 6e+08 *+..*..* *...* * *...*..*...*...*..*...*..*...*..*...*
> > 5e+08 ++ : : : :: : |
> > 4e+08 ++ : : : : : : |
> > | : : : : : : |
> > 3e+08 ++ : : : : : : |
> > 2e+08 ++ : : : : : : |
> > | : : : : : : |
> > 1e+08 ++ : :: : |
> > 0 ++--O------*---------*-------*--------------------------------------+
> >
> >
> > perf-stat.context-switches
> >
> > 3e+06 ++----------------------------------------------------------------+
> > | *...*..*... |
> > 2.5e+06 *+.*...* *..* * : *..*... .*...*..*... .*
> > | : : : : : *. *. |
> > O O: O :O O: O :: : O O O O O O |
> > 2e+06 ++ : : : :O: O :O O |
> > | : : : : : : |
> > 1.5e+06 ++ : : : : : : |
> > | : : : : : : |
> > 1e+06 ++ : : : : : : |
> > | : : : : : : |
> > | : : : : : : |
> > 500000 ++ :: : : :: |
> > | : : : |
> > 0 ++-O------*----------*------*-------------------------------------+
> >
> >
> > vmstat.system.cs
> >
> > 10000 ++------------------------------------------------------------------+
> > 9000 ++ *...*.. |
> > *...*..* *...* * : *...*...*.. ..*..*...*.. ..*
> > 8000 ++ : : : : : *. *. |
> > 7000 O+ O: O O O: O : : : O O O O O O |
> > | : : : :O: O :O O |
> > 6000 ++ : : : : : : |
> > 5000 ++ : : : : : : |
> > 4000 ++ : : : : : : |
> > | : : : : : : |
> > 3000 ++ : : : : : : |
> > 2000 ++ : : : : : : |
> > | : : :: :: |
> > 1000 ++ : : : |
> > 0 ++--O------*---------*-------*--------------------------------------+
> >
> >
> > [*] bisect-good sample
> > [O] bisect-bad sample
> >
> > To reproduce:
> >
> > apt-get install ruby
> > git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
> > cd lkp-tests
> > bin/setup-local job.yaml # the job file attached in this email
> > bin/run-local job.yaml
> >
> >
> > Disclaimer:
> > Results have been estimated based on internal Intel analysis and are provided
> > for informational purposes only. Any difference in system hardware or software
> > design or configuration may affect actual performance.
> >
> >
> > Thanks,
> > Ying Huang
> >
>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/