Re: [sched/fair] caeb178c60f: +252.0% cpuidle.C1-SNB.time, +3.1% turbostat.Pkg_W

From: Fengguang Wu
Date: Thu Aug 21 2014 - 11:01:33 EST


On Thu, Aug 21, 2014 at 10:16:13AM -0400, Rik van Riel wrote:
> On 08/21/2014 10:01 AM, Fengguang Wu wrote:
> > Hi Rik,
> >
> > FYI, we noticed the below changes on
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core
> > commit caeb178c60f4f93f1b45c0bc056b5cf6d217b67f ("sched/fair: Make update_sd_pick_busiest() return 'true' on a busier sd")
> >
> > testbox/testcase/testparams: lkp-sb03/nepim/300s-100%-tcp6
>
> Is this good or bad?

It seems mixed results. The throughput is 2.4% better in sequential
write test, while the power consumption (turbostat.Pkg_W) increases
by 3.1% in the nepim/300s-100%-tcp test.

> The numbers suggest the xfs + raid5 workload is doing around 2.4%
> more IO to disk per second with this change in, and there is more

Right.

> CPU idle time in the system...

Sorry "cpuidle" is the monitor name. You can find its code here:

https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/tree/monitors/cpuidle

"cpuidle.C1-SNB.time" means the time spend in C1 state.

> For the tcp test, I see no throughput numbers, but I see more
> idle time as well as more time in turbo mode, and more softirqs,
> which could mean that more packets were handled.

Again, "turbostat" is a monitor name. "turbostat.Pkg_W" means the
CPU package watts reported by the turbostat tool.

> Does the patch introduce any performance issues, or did it
> simply trip up something in the statistics that your script
> noticed?

In normal LKP reports, only changed stats are listed. Here is the
performance/power index comparison, which lists all performance/power
related stats. The index is geometric average of all results. Baseline
is 100 for 743cb1ff191f00f.

100 perf-index (the larger, the better)
98 power-index (the larger, the better)

743cb1ff191f00f caeb178c60f4f93f1b45c0bc0 testbox/testcase/testparams
--------------- ------------------------- ---------------------------
%stddev %change %stddev
\ | /
691053 Â 4% -5.1% 656100 Â 4% lkp-sb03/nepim/300s-100%-tcp
570185 Â 7% +5.4% 600774 Â 4% lkp-sb03/nepim/300s-100%-tcp6
1261238 Â 5% -0.3% 1256875 Â 4% TOTAL nepim.tcp.avg.kbps_in

743cb1ff191f00f caeb178c60f4f93f1b45c0bc0
--------------- -------------------------
691216 Â 4% -5.1% 656264 Â 4% lkp-sb03/nepim/300s-100%-tcp
570347 Â 7% +5.4% 600902 Â 4% lkp-sb03/nepim/300s-100%-tcp6
1261564 Â 5% -0.3% 1257167 Â 4% TOTAL nepim.tcp.avg.kbps_out

743cb1ff191f00f caeb178c60f4f93f1b45c0bc0
--------------- -------------------------
77.48 Â 1% +3.1% 79.91 Â 1% lkp-sb03/nepim/300s-100%-tcp
79.69 Â 2% -0.6% 79.21 Â 1% lkp-sb03/nepim/300s-100%-tcp6
157.17 Â 2% +1.2% 159.13 Â 1% TOTAL turbostat.Pkg_W

743cb1ff191f00f caeb178c60f4f93f1b45c0bc0
--------------- -------------------------
6.05 Â 1% +1.2% 6.12 Â 1% lkp-sb03/nepim/300s-100%-tcp
6.06 Â 0% +1.0% 6.12 Â 1% lkp-sb03/nepim/300s-100%-tcp6
12.11 Â 1% +1.1% 12.24 Â 1% TOTAL turbostat.%c0

743cb1ff191f00f caeb178c60f4f93f1b45c0bc0
--------------- -------------------------
325759 Â 0% +2.4% 333577 Â 0% lkp-st02/dd-write/5m-11HDD-RAID5-cfq-xfs-1dd
325759 Â 0% +2.4% 333577 Â 0% TOTAL iostat.md0.wkB/s

The nepim throughput numbers are not stable enough comparing to the
change, so are not regarded as real changes in the original email.
I will need to increase its test time to make it more stable..

Thanks,
Fengguang

> > 743cb1ff191f00f caeb178c60f4f93f1b45c0bc0
> > --------------- -------------------------
> > 29718911 Â45% +329.5% 1.277e+08 Â10% cpuidle.C1E-SNB.time
> > 861 Â34% +1590.4% 14564 Â31% cpuidle.C3-SNB.usage
> > 1.65e+08 Â20% +175.4% 4.544e+08 Â15% cpuidle.C1-SNB.time
> > 24 Â41% +247.6% 86 Â23% numa-numastat.node1.other_node
> > 27717 Â11% +98.7% 55085 Â 6% softirqs.RCU
> > 180767 Â11% +86.7% 337416 Â10% cpuidle.C7-SNB.usage
> > 104591 Â14% +77.4% 185581 Â10% cpuidle.C1E-SNB.usage
> > 384 Â10% +33.3% 512 Â11% slabinfo.kmem_cache.num_objs
> > 384 Â10% +33.3% 512 Â11% slabinfo.kmem_cache.active_objs
> > 494 Â 8% +25.9% 622 Â 9% slabinfo.kmem_cache_node.active_objs
> > 512 Â 7% +25.0% 640 Â 8% slabinfo.kmem_cache_node.num_objs
> > 83427 Â 6% +10.3% 92028 Â 5% meminfo.DirectMap4k
> > 9508 Â 1% +21.3% 11534 Â 7% slabinfo.kmalloc-512.active_objs
> > 9838 Â 1% +20.5% 11852 Â 6% slabinfo.kmalloc-512.num_objs
> > 53997 Â 6% +11.1% 59981 Â 4% numa-meminfo.node1.Slab
> > 2662 Â 3% -9.0% 2424 Â 3% slabinfo.kmalloc-96.active_objs
> > 2710 Â 3% -8.6% 2478 Â 3% slabinfo.kmalloc-96.num_objs
> > 921 Â41% +3577.7% 33901 Â14% time.involuntary_context_switches
> > 2371 Â 2% +15.5% 2739 Â 2% vmstat.system.in
> >
> > testbox/testcase/testparams: lkp-sb03/nepim/300s-100%-tcp
> >
> > 743cb1ff191f00f caeb178c60f4f93f1b45c0bc0
> > --------------- -------------------------
> > 20657207 Â31% +358.2% 94650352 Â18% cpuidle.C1E-SNB.time
> > 29718911 Â45% +329.5% 1.277e+08 Â10% cpuidle.C1E-SNB.time
> > 861 Â34% +1590.4% 14564 Â31% cpuidle.C3-SNB.usage
> > 0.05 Â46% +812.5% 0.44 Â34% turbostat.%c3
> > 1.12e+08 Â25% +364.8% 5.207e+08 Â15% cpuidle.C1-SNB.time
> > 1.65e+08 Â20% +175.4% 4.544e+08 Â15% cpuidle.C1-SNB.time
> > 35 Â19% +105.6% 72 Â28% numa-numastat.node1.other_node
> > 24 Â41% +247.6% 86 Â23% numa-numastat.node1.other_node
> > 43 Â22% +86.2% 80 Â26% numa-vmstat.node0.nr_dirtied
> > 24576 Â 6% +113.9% 52574 Â 1% softirqs.RCU
> > 27717 Â11% +98.7% 55085 Â 6% softirqs.RCU
> > 211533 Â 6% +58.4% 334990 Â 8% cpuidle.C7-SNB.usage
> > 180767 Â11% +86.7% 337416 Â10% cpuidle.C7-SNB.usage
> > 77739 Â13% +52.9% 118876 Â18% cpuidle.C1E-SNB.usage
> > 104591 Â14% +77.4% 185581 Â10% cpuidle.C1E-SNB.usage
> > 32.09 Â14% -24.8% 24.12 Â18% turbostat.%pc2
> > 9.04 Â 6% +41.6% 12.80 Â 6% turbostat.%c1
> > 384 Â10% +33.3% 512 Â11% slabinfo.kmem_cache.num_objs
> > 384 Â10% +33.3% 512 Â11% slabinfo.kmem_cache.active_objs
> > 494 Â 8% +25.9% 622 Â 9% slabinfo.kmem_cache_node.active_objs
> > 512 Â 7% +25.0% 640 Â 8% slabinfo.kmem_cache_node.num_objs
> > 379 Â 9% +16.7% 443 Â 7% numa-vmstat.node0.nr_page_table_pages
> > 83427 Â 6% +10.3% 92028 Â 5% meminfo.DirectMap4k
> > 1579 Â 6% -15.3% 1338 Â 7% numa-meminfo.node1.PageTables
> > 394 Â 6% -15.1% 334 Â 7% numa-vmstat.node1.nr_page_table_pages
> > 1509 Â 7% +16.6% 1760 Â 7% numa-meminfo.node0.PageTables
> > 12681 Â 1% -17.3% 10482 Â14% numa-meminfo.node1.AnonPages
> > 3169 Â 1% -17.3% 2620 Â14% numa-vmstat.node1.nr_anon_pages
> > 10171 Â 3% +10.9% 11283 Â 3% slabinfo.kmalloc-512.active_objs
> > 9508 Â 1% +21.3% 11534 Â 7% slabinfo.kmalloc-512.active_objs
> > 10481 Â 3% +10.9% 11620 Â 3% slabinfo.kmalloc-512.num_objs
> > 9838 Â 1% +20.5% 11852 Â 6% slabinfo.kmalloc-512.num_objs
> > 53997 Â 6% +11.1% 59981 Â 4% numa-meminfo.node1.Slab
> > 5072 Â 1% +11.6% 5662 Â 3% slabinfo.kmalloc-2048.num_objs
> > 4974 Â 1% +11.6% 5551 Â 3% slabinfo.kmalloc-2048.active_objs
> > 12824 Â 2% -16.1% 10754 Â14% numa-meminfo.node1.Active(anon)
> > 3205 Â 2% -16.2% 2687 Â14% numa-vmstat.node1.nr_active_anon
> > 2662 Â 3% -9.0% 2424 Â 3% slabinfo.kmalloc-96.active_objs
> > 2710 Â 3% -8.6% 2478 Â 3% slabinfo.kmalloc-96.num_objs
> > 15791 Â 1% +15.2% 18192 Â 9% numa-meminfo.node0.AnonPages
> > 3949 Â 1% +15.2% 4549 Â 9% numa-vmstat.node0.nr_anon_pages
> > 13669 Â 1% -7.5% 12645 Â 2% slabinfo.kmalloc-16.num_objs
> > 662 Â23% +4718.6% 31918 Â12% time.involuntary_context_switches
> > 921 Â41% +3577.7% 33901 Â14% time.involuntary_context_switches
> > 2463 Â 1% +13.1% 2786 Â 3% vmstat.system.in
> > 2371 Â 2% +15.5% 2739 Â 2% vmstat.system.in
> > 49.40 Â 2% +4.8% 51.79 Â 2% turbostat.Cor_W
> > 77.48 Â 1% +3.1% 79.91 Â 1% turbostat.Pkg_W
> >
> > testbox/testcase/testparams: lkp-st02/dd-write/5m-11HDD-RAID5-cfq-xfs-1dd
> >
> > 743cb1ff191f00f caeb178c60f4f93f1b45c0bc0
> > --------------- -------------------------
> > 18571 Â 7% +31.4% 24396 Â 4% proc-vmstat.pgscan_direct_normal
> > 39983 Â 2% +38.3% 55286 Â 0% perf-stat.cpu-migrations
> > 4193962 Â 2% +20.9% 5072009 Â 3% perf-stat.iTLB-load-misses
> > 4.568e+09 Â 2% -17.2% 3.781e+09 Â 1% perf-stat.L1-icache-load-misses
> > 1.762e+10 Â 0% -7.8% 1.625e+10 Â 1% perf-stat.cache-references
> > 1.408e+09 Â 1% -6.6% 1.315e+09 Â 1% perf-stat.branch-load-misses
> > 1.407e+09 Â 1% -6.5% 1.316e+09 Â 1% perf-stat.branch-misses
> > 6.839e+09 Â 1% +5.0% 7.185e+09 Â 2% perf-stat.LLC-loads
> > 1.558e+10 Â 0% +3.5% 1.612e+10 Â 1% perf-stat.L1-dcache-load-misses
> > 1.318e+12 Â 0% +3.4% 1.363e+12 Â 0% perf-stat.L1-icache-loads
> > 2.979e+10 Â 1% +2.4% 3.051e+10 Â 0% perf-stat.L1-dcache-store-misses
> > 1.893e+11 Â 0% +2.5% 1.94e+11 Â 0% perf-stat.branch-instructions
> > 2.298e+11 Â 0% +2.7% 2.361e+11 Â 0% perf-stat.L1-dcache-stores
> > 1.016e+12 Â 0% +2.6% 1.042e+12 Â 0% perf-stat.instructions
> > 1.892e+11 Â 0% +2.5% 1.94e+11 Â 0% perf-stat.branch-loads
> > 3.71e+11 Â 0% +2.4% 3.799e+11 Â 0% perf-stat.dTLB-loads
> > 3.711e+11 Â 0% +2.3% 3.798e+11 Â 0% perf-stat.L1-dcache-loads
> > 325768 Â 0% +2.7% 334461 Â 0% vmstat.io.bo
> > 8083 Â 0% +2.4% 8278 Â 0% iostat.sdf.wrqm/s
> > 8083 Â 0% +2.4% 8278 Â 0% iostat.sdk.wrqm/s
> > 8082 Â 0% +2.4% 8276 Â 0% iostat.sdg.wrqm/s
> > 32615 Â 0% +2.4% 33398 Â 0% iostat.sdf.wkB/s
> > 32617 Â 0% +2.4% 33401 Â 0% iostat.sdk.wkB/s
> > 32612 Â 0% +2.4% 33393 Â 0% iostat.sdg.wkB/s
> > 8083 Â 0% +2.4% 8277 Â 0% iostat.sdl.wrqm/s
> > 8083 Â 0% +2.4% 8276 Â 0% iostat.sdi.wrqm/s
> > 8082 Â 0% +2.4% 8277 Â 0% iostat.sdc.wrqm/s
> > 32614 Â 0% +2.4% 33396 Â 0% iostat.sdl.wkB/s
> > 8083 Â 0% +2.4% 8278 Â 0% iostat.sde.wrqm/s
> > 8082 Â 0% +2.4% 8277 Â 0% iostat.sdh.wrqm/s
> > 8083 Â 0% +2.4% 8277 Â 0% iostat.sdd.wrqm/s
> > 32614 Â 0% +2.4% 33393 Â 0% iostat.sdi.wkB/s
> > 32611 Â 0% +2.4% 33395 Â 0% iostat.sdc.wkB/s
> > 325759 Â 0% +2.4% 333577 Â 0% iostat.md0.wkB/s
> > 1274 Â 0% +2.4% 1305 Â 0% iostat.md0.w/s
> > 8082 Â 0% +2.4% 8277 Â 0% iostat.sdb.wrqm/s
> > 32618 Â 0% +2.4% 33398 Â 0% iostat.sde.wkB/s
> > 32612 Â 0% +2.4% 33395 Â 0% iostat.sdh.wkB/s
> > 32618 Â 0% +2.4% 33397 Â 0% iostat.sdd.wkB/s
> > 8084 Â 0% +2.4% 8278 Â 0% iostat.sdj.wrqm/s
> > 32611 Â 0% +2.4% 33396 Â 0% iostat.sdb.wkB/s
> > 32618 Â 0% +2.4% 33400 Â 0% iostat.sdj.wkB/s
> > 2.3e+11 Â 0% +2.5% 2.357e+11 Â 0% perf-stat.dTLB-stores
> > 4898 Â 0% +2.1% 5003 Â 0% vmstat.system.cs
> > 1.017e+12 Â 0% +2.4% 1.042e+12 Â 0% perf-stat.iTLB-loads
> > 1518279 Â 0% +2.1% 1549457 Â 0% perf-stat.context-switches
> > 1.456e+12 Â 0% +1.4% 1.476e+12 Â 0% perf-stat.cpu-cycles
> > 1.456e+12 Â 0% +1.3% 1.475e+12 Â 0% perf-stat.ref-cycles
> > 1.819e+11 Â 0% +1.3% 1.843e+11 Â 0% perf-stat.bus-cycles
> >
> > lkp-sb03 is a Sandy Bridge-EP server.
> > Memory: 64G
> > Architecture: x86_64
> > CPU op-mode(s): 32-bit, 64-bit
> > Byte Order: Little Endian
> > CPU(s): 32
> > On-line CPU(s) list: 0-31
> > Thread(s) per core: 2
> > Core(s) per socket: 8
> > Socket(s): 2
> > NUMA node(s): 2
> > Vendor ID: GenuineIntel
> > CPU family: 6
> > Model: 45
> > Stepping: 6
> > CPU MHz: 3500.613
> > BogoMIPS: 5391.16
> > Virtualization: VT-x
> > L1d cache: 32K
> > L1i cache: 32K
> > L2 cache: 256K
> > L3 cache: 20480K
> > NUMA node0 CPU(s): 0-7,16-23
> > NUMA node1 CPU(s): 8-15,24-31
> >
> > lkp-st02 is Core2
> > Memory: 8G
> >
> >
> >
> >
> > time.involuntary_context_switches
> >
> > 40000 O+------------------------------------------------------------------+
> > | O O O |
> > 35000 ++O O O O O O |
> > 30000 ++ O O O |
> > | O O O |
> > 25000 ++ O O |
> > | |
> > 20000 ++ |
> > | |
> > 15000 ++ |
> > 10000 ++ |
> > | |
> > 5000 ++ |
> > | .*. |
> > 0 *+*--*-*-*-*--*-*-*-*--*-*-*-*--*-*-*--*-*-*-*--*---*-*--*-*-*-*--*-*
> >
> >
> > [*] bisect-good sample
> > [O] bisect-bad sample
> >
> >
> > Disclaimer:
> > Results have been estimated based on internal Intel analysis and are provided
> > for informational purposes only. Any difference in system hardware or software
> > design or configuration may affect actual performance.
> >
> > Thanks,
> > Fengguang
> >
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/