[LKP] [SUNRPC] 127b21b89f9: nfs loopback setup: +129.7% fsmark.app_overhead, +14.2% fsmark.time.involuntary_context_switches

From: Huang Ying
Date: Mon Feb 16 2015 - 02:25:28 EST


FYI, we noticed the below changes on

git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
commit 127b21b89f9d8ba0dc23e47b8c35d8a0bac9d6fc ("SUNRPC: Adjust rpciod workqueue parameters")


testbox/testcase/testparams: nhm4/fsmark/performance-1x-32t-1HDD-xfs-nfsv4-5K-400M-fsyncBeforeClose-16d-256fpd

40dd4b7aee1a8c3b 127b21b89f9d8ba0dc23e47b8c
---------------- --------------------------
%stddev %change %stddev
\ | \
8133446 Â 0% -1.3% 8023711 Â 0% fsmark.app_overhead
52620 Â 0% +60.6% 84523 Â 0% fsmark.time.involuntary_context_switches
5 Â 37% +708.7% 46 Â 16% sched_debug.cpu#1.cpu_load[4]
7 Â 47% +906.9% 73 Â 46% sched_debug.cpu#3.cpu_load[3]
7 Â 33% +528.6% 44 Â 21% sched_debug.cpu#1.cpu_load[2]
6 Â 32% +577.8% 45 Â 19% sched_debug.cpu#1.cpu_load[3]
7 Â 39% +753.6% 59 Â 27% sched_debug.cpu#3.cpu_load[4]
8 Â 39% +434.4% 42 Â 41% sched_debug.cpu#2.cpu_load[4]
7 Â 29% -67.7% 2 Â 34% sched_debug.cfs_rq[1]:/.nr_spread_over
475 Â 42% +264.9% 1733 Â 17% sched_debug.cfs_rq[2]:/.blocked_load_avg
9 Â 46% +254.1% 32 Â 42% sched_debug.cpu#5.cpu_load[4]
495 Â 41% +260.8% 1787 Â 17% sched_debug.cfs_rq[2]:/.tg_load_contrib
450 Â 26% +193.5% 1321 Â 14% sched_debug.cfs_rq[3]:/.tg_load_contrib
440 Â 29% +194.7% 1296 Â 14% sched_debug.cfs_rq[3]:/.blocked_load_avg
586 Â 30% +129.7% 1345 Â 17% sched_debug.cfs_rq[1]:/.blocked_load_avg
617 Â 27% +144.2% 1506 Â 15% sched_debug.cfs_rq[1]:/.tg_load_contrib
936 Â 28% +91.6% 1793 Â 16% sched_debug.cfs_rq[5]:/.tg_load_contrib
922 Â 29% +92.0% 1771 Â 16% sched_debug.cfs_rq[5]:/.blocked_load_avg
144242 Â 26% +70.3% 245615 Â 14% sched_debug.cpu#3.ttwu_local
159375 Â 23% +61.9% 257969 Â 13% sched_debug.cpu#3.sched_goidle
1754 Â 29% +61.0% 2824 Â 24% sched_debug.cfs_rq[0]:/.blocked_load_avg
171393 Â 22% +59.6% 273569 Â 13% sched_debug.cpu#3.ttwu_count
346764 Â 22% +58.3% 548977 Â 13% sched_debug.cpu#3.nr_switches
346840 Â 22% +58.3% 549053 Â 13% sched_debug.cpu#3.sched_count
1975 Â 19% +50.7% 2976 Â 19% sched_debug.cfs_rq[0]:/.tg_load_contrib
17240 Â 12% +37.6% 23728 Â 8% sched_debug.cpu#3.nr_load_updates
52620 Â 0% +60.6% 84523 Â 0% time.involuntary_context_switches
9440 Â 6% +61.6% 15259 Â 5% sched_debug.cfs_rq[0]:/.tg_load_avg
5200 Â 16% +49.2% 7759 Â 18% sched_debug.cfs_rq[3]:/.min_vruntime
9394 Â 6% +60.8% 15110 Â 6% sched_debug.cfs_rq[2]:/.tg_load_avg
9408 Â 6% +62.2% 15260 Â 5% sched_debug.cfs_rq[1]:/.tg_load_avg
9387 Â 6% +59.8% 14999 Â 7% sched_debug.cfs_rq[3]:/.tg_load_avg
9370 Â 6% +59.7% 14961 Â 6% sched_debug.cfs_rq[7]:/.tg_load_avg
9371 Â 6% +58.3% 14832 Â 7% sched_debug.cfs_rq[6]:/.tg_load_avg
9389 Â 6% +59.0% 14924 Â 6% sched_debug.cfs_rq[5]:/.tg_load_avg
9391 Â 6% +59.1% 14937 Â 6% sched_debug.cfs_rq[4]:/.tg_load_avg
2492 Â 10% +22.8% 3059 Â 8% sched_debug.cfs_rq[3]:/.exec_clock
654180 Â 10% -17.6% 539086 Â 10% sched_debug.cpu#3.avg_idle
67 Â 22% +29.9% 88 Â 4% sched_debug.cfs_rq[3]:/.tg_runnable_contrib
610001 Â 7% -18.3% 498400 Â 8% sched_debug.cpu#0.avg_idle
132930 Â 23% +39.8% 185843 Â 4% sched_debug.cpu#2.ttwu_local
4543 Â 14% +34.5% 6112 Â 4% sched_debug.cfs_rq[2]:/.min_vruntime
161157 Â 19% +33.0% 214350 Â 3% sched_debug.cpu#2.ttwu_count
148740 Â 20% +33.5% 198618 Â 3% sched_debug.cpu#2.sched_goidle
325967 Â 19% +32.1% 430669 Â 3% sched_debug.cpu#2.nr_switches
326041 Â 19% +32.1% 430744 Â 3% sched_debug.cpu#2.sched_count
17437 Â 14% +21.7% 21219 Â 8% sched_debug.cpu#2.nr_load_updates
3598 Â 7% +14.9% 4134 Â 5% sched_debug.cfs_rq[0]:/.avg->runnable_avg_sum
3338 Â 8% +18.6% 3959 Â 9% sched_debug.cfs_rq[1]:/.avg->runnable_avg_sum
1018 Â 31% -23.9% 775 Â 2% sched_debug.cfs_rq[4]:/.exec_clock
72 Â 8% +18.3% 85 Â 9% sched_debug.cfs_rq[1]:/.tg_runnable_contrib
8293 Â 45% -30.4% 5770 Â 4% sched_debug.cpu#4.nr_load_updates
2420 Â 7% +13.7% 2752 Â 0% sched_debug.cfs_rq[2]:/.exec_clock
9386 Â 1% +2.3% 9598 Â 1% vmstat.io.bo
40903 Â 0% +2.6% 41951 Â 0% vmstat.system.cs

testbox/testcase/testparams: nhm4/fsmark/performance-1x-32t-1HDD-xfs-nfsv4-16MB-60G-NoSync-16d-256fpd

40dd4b7aee1a8c3b 127b21b89f9d8ba0dc23e47b8c
---------------- --------------------------
50440392 Â 6% +129.7% 1.158e+08 Â 8% fsmark.app_overhead
226074 Â 0% +14.2% 258118 Â 0% fsmark.time.involuntary_context_switches
8 Â 5% +21.2% 10 Â 0% fsmark.time.percent_of_cpu_this_job_got
60.69 Â 0% +13.5% 68.86 Â 0% fsmark.time.system_time
2523773 Â 0% +1.4% 2558878 Â 0% fsmark.time.voluntary_context_switches
8 Â 39% +188.2% 24 Â 24% sched_debug.cpu#2.cpu_load[3]
8 Â 31% +197.1% 25 Â 13% sched_debug.cpu#2.cpu_load[4]
283664 Â 13% +237.6% 957681 Â 7% proc-vmstat.pgsteal_direct_dma32
30933 Â 17% +205.1% 94381 Â 30% sched_debug.cpu#5.ttwu_local
2119 Â 14% +231.0% 7014 Â 7% proc-vmstat.allocstall
16 Â 36% +59.1% 26 Â 3% sched_debug.cpu#0.cpu_load[4]
18 Â 30% +85.3% 34 Â 26% sched_debug.cpu#5.cpu_load[4]
213724 Â 4% +22.0% 260658 Â 17% sched_debug.cpu#6.ttwu_count
330125 Â 12% +195.4% 975107 Â 8% proc-vmstat.pgscan_direct_dma32
471 Â 49% +156.7% 1210 Â 34% sched_debug.cpu#1.curr->pid
1717 Â 11% +131.8% 3981 Â 2% proc-vmstat.nr_dirty
217 Â 9% +136.3% 514 Â 13% sched_debug.cfs_rq[1]:/.blocked_load_avg
229 Â 10% +138.3% 546 Â 18% sched_debug.cfs_rq[1]:/.tg_load_contrib
7162 Â 13% +121.3% 15849 Â 1% meminfo.Dirty
521 Â 11% +81.3% 945 Â 19% sched_debug.cfs_rq[5]:/.tg_load_contrib
506 Â 12% +84.3% 932 Â 19% sched_debug.cfs_rq[5]:/.blocked_load_avg
3 Â 0% +66.7% 5 Â 0% vmstat.procs.b
537 Â 20% +37.6% 739 Â 10% sched_debug.cfs_rq[0]:/.blocked_load_avg
574 Â 19% +41.1% 810 Â 12% sched_debug.cfs_rq[0]:/.tg_load_contrib
195 Â 6% -33.7% 129 Â 11% sched_debug.cpu#0.nr_uninterruptible
226074 Â 0% +14.2% 258118 Â 0% time.involuntary_context_switches
29991075 Â 14% -22.0% 23379253 Â 19% cpuidle.POLL.time
54075 Â 2% +42.5% 77063 Â 1% meminfo.Writeback
3770 Â 6% +51.9% 5725 Â 8% sched_debug.cfs_rq[0]:/.tg_load_avg
3762 Â 6% +50.0% 5645 Â 7% sched_debug.cfs_rq[2]:/.tg_load_avg
3768 Â 6% +50.9% 5687 Â 7% sched_debug.cfs_rq[1]:/.tg_load_avg
3767 Â 6% +49.3% 5624 Â 7% sched_debug.cfs_rq[3]:/.tg_load_avg
3756 Â 6% +48.7% 5585 Â 8% sched_debug.cfs_rq[7]:/.tg_load_avg
3762 Â 6% +48.9% 5602 Â 8% sched_debug.cfs_rq[6]:/.tg_load_avg
3762 Â 6% +48.6% 5591 Â 8% sched_debug.cfs_rq[5]:/.tg_load_avg
3761 Â 6% +50.1% 5647 Â 7% sched_debug.cfs_rq[4]:/.tg_load_avg
13632 Â 3% +40.9% 19210 Â 1% proc-vmstat.nr_writeback
10367 Â 3% +29.3% 13406 Â 9% sched_debug.cfs_rq[6]:/.min_vruntime
89961 Â 1% +35.0% 121410 Â 1% meminfo.NFS_Unstable
22527 Â 1% +35.6% 30545 Â 1% proc-vmstat.nr_unstable
207975 Â 0% +34.6% 280001 Â 8% sched_debug.cpu#5.ttwu_count
10227 Â 4% +30.6% 13360 Â 2% sched_debug.cfs_rq[5]:/.min_vruntime
143095 Â 4% -25.8% 106195 Â 4% proc-vmstat.pgactivate
759903 Â 15% +19.2% 905820 Â 13% sched_debug.cpu#0.nr_switches
760233 Â 15% +19.2% 906200 Â 13% sched_debug.cpu#0.sched_count
589 Â 7% +27.7% 753 Â 8% sched_debug.cfs_rq[4]:/.blocked_load_avg
8 Â 5% +21.2% 10 Â 0% time.percent_of_cpu_this_job_got
3703 Â 5% +41.5% 5241 Â 31% proc-vmstat.pgsteal_kswapd_dma
613 Â 6% +27.1% 779 Â 9% sched_debug.cfs_rq[4]:/.tg_load_contrib
18002 Â 9% +24.5% 22412 Â 10% sched_debug.cfs_rq[0]:/.min_vruntime
4051 Â 4% +38.7% 5619 Â 29% proc-vmstat.pgscan_kswapd_dma
570 Â 16% +30.2% 742 Â 19% sched_debug.cfs_rq[7]:/.tg_load_contrib
67791 Â 3% -20.1% 54186 Â 9% proc-vmstat.kswapd_inodesteal
2168 Â 1% +22.0% 2645 Â 0% slabinfo.kmalloc-128.active_slabs
2168 Â 1% +22.0% 2645 Â 0% slabinfo.kmalloc-128.num_slabs
69411 Â 1% +22.0% 84660 Â 0% slabinfo.kmalloc-128.num_objs
13 Â 23% -44.4% 7 Â 34% sched_debug.cfs_rq[0]:/.nr_spread_over
66478 Â 1% +18.6% 78816 Â 0% slabinfo.kmalloc-128.active_objs
0.50 Â 3% +14.0% 0.57 Â 5% time.user_time
4983 Â 4% +31.3% 6541 Â 26% proc-vmstat.pgalloc_dma
60407 Â 3% -15.5% 51037 Â 3% proc-vmstat.nr_active_file
241585 Â 3% -15.5% 204154 Â 3% meminfo.Active(file)
536 Â 3% -13.1% 466 Â 2% proc-vmstat.kswapd_high_wmark_hit_quickly
3892 Â 18% -22.4% 3018 Â 0% uptime.idle
264741 Â 3% -13.8% 228127 Â 3% meminfo.Active
60.69 Â 0% +13.5% 68.86 Â 0% time.system_time
11151 Â 6% +19.0% 13267 Â 1% sched_debug.cfs_rq[4]:/.min_vruntime
500911 Â 1% -10.7% 447478 Â 0% cpuidle.C6-NHM.usage
4.834e+08 Â 3% +11.3% 5.38e+08 Â 2% cpuidle.C3-NHM.time
19446 Â 0% +0.9% 19618 Â 0% vmstat.system.cs

testbox/testcase/testparams: nhm4/fsmark/performance-1x-32t-1HDD-xfs-nfsv4-8K-400M-fsyncBeforeClose-16d-256fpd

40dd4b7aee1a8c3b 127b21b89f9d8ba0dc23e47b8c
---------------- --------------------------
33823 Â 0% +56.8% 53052 Â 0% fsmark.time.involuntary_context_switches
49306 Â 44% -75.7% 11963 Â 24% sched_debug.cpu#6.sched_goidle
1 Â 34% +440.0% 6 Â 46% sched_debug.cfs_rq[4]:/.nr_spread_over
105610 Â 41% -69.9% 31780 Â 17% sched_debug.cpu#6.nr_switches
105651 Â 41% -69.9% 31817 Â 17% sched_debug.cpu#6.sched_count
11 Â 20% -47.7% 5 Â 33% sched_debug.cfs_rq[1]:/.nr_spread_over
44125 Â 49% -82.9% 7541 Â 41% sched_debug.cpu#6.ttwu_local
51122 Â 42% -71.1% 14769 Â 19% sched_debug.cpu#6.ttwu_count
5455 Â 26% -45.5% 2974 Â 5% sched_debug.cpu#6.nr_load_updates
800 Â 34% +148.9% 1991 Â 37% sched_debug.cfs_rq[0]:/.blocked_load_avg
1031 Â 26% +114.1% 2209 Â 34% sched_debug.cfs_rq[0]:/.tg_load_contrib
8 Â 32% +45.7% 12 Â 18% sched_debug.cfs_rq[2]:/.nr_spread_over
33823 Â 0% +56.8% 53052 Â 0% time.involuntary_context_switches
10789 Â 9% +25.4% 13532 Â 5% sched_debug.cfs_rq[0]:/.tg_load_avg
10783 Â 9% +25.2% 13497 Â 5% sched_debug.cfs_rq[2]:/.tg_load_avg
10777 Â 9% +25.2% 13494 Â 5% sched_debug.cfs_rq[1]:/.tg_load_avg
10799 Â 9% +24.9% 13490 Â 5% sched_debug.cfs_rq[3]:/.tg_load_avg
10734 Â 9% +25.4% 13456 Â 5% sched_debug.cfs_rq[7]:/.tg_load_avg
10743 Â 9% +25.6% 13495 Â 5% sched_debug.cfs_rq[6]:/.tg_load_avg
10743 Â 9% +25.7% 13502 Â 5% sched_debug.cfs_rq[5]:/.tg_load_avg
10798 Â 9% +24.8% 13481 Â 5% sched_debug.cfs_rq[4]:/.tg_load_avg
796 Â 14% -21.0% 629 Â 5% sched_debug.cfs_rq[6]:/.exec_clock
2161 Â 17% -25.4% 1612 Â 5% sched_debug.cfs_rq[6]:/.min_vruntime
297542 Â 9% -22.0% 232177 Â 17% sched_debug.cpu#0.nr_switches
297589 Â 9% -22.0% 232227 Â 17% sched_debug.cpu#0.sched_count
1901 Â 8% +23.2% 2342 Â 10% sched_debug.cfs_rq[6]:/.tg_load_contrib
7060 Â 10% -13.3% 6122 Â 7% sched_debug.cfs_rq[0]:/.min_vruntime
748854 Â 5% -20.0% 599063 Â 6% sched_debug.cpu#0.avg_idle
1851 Â 6% +24.1% 2298 Â 10% sched_debug.cfs_rq[6]:/.blocked_load_avg
140093 Â 9% -23.4% 107299 Â 19% sched_debug.cpu#0.ttwu_local
141664 Â 9% -23.0% 109136 Â 18% sched_debug.cpu#0.sched_goidle
52 Â 13% +29.7% 67 Â 2% sched_debug.cfs_rq[2]:/.tg_runnable_contrib
2440 Â 13% +28.1% 3126 Â 2% sched_debug.cfs_rq[2]:/.avg->runnable_avg_sum
165266 Â 8% -19.5% 133017 Â 15% sched_debug.cpu#0.ttwu_count
17128 Â 5% -14.3% 14673 Â 8% sched_debug.cpu#0.nr_load_updates
34290 Â 0% +2.0% 34989 Â 0% vmstat.system.cs

nhm4: Nehalem
Memory: 4G

time.involuntary_context_switches

60000 ++------------------------------------------------------------------+
| |
55000 O+OO O O OO O OO O OO O O OO O OO O OO O O OO O OO O OO O O OO |
| O OO O
| |
50000 ++ |
| |
45000 ++ |
| |
40000 ++ |
| |
| |
35000 ++* .*.**.*.* .*.**. .**.*. .*.* .*.*. *. .**.*.*.* |
* *.* * *.* ** * **.*.* * |
30000 ++------------------------------------------------------------------+

[*] bisect-good sample
[O] bisect-bad sample

To reproduce:

apt-get install ruby
git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
cd lkp-tests
bin/setup-local job.yaml # the job file attached in this email
bin/run-local job.yaml


Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Ying Huang

---
testcase: fsmark
default-monitors:
wait: pre-test
uptime:
iostat:
vmstat:
numa-numastat:
numa-vmstat:
numa-meminfo:
proc-vmstat:
proc-stat:
meminfo:
slabinfo:
interrupts:
lock_stat:
latency_stats:
softirqs:
bdi_dev_mapping:
diskstats:
nfsstat:
cpuidle:
cpufreq-stats:
turbostat:
pmeter:
sched_debug:
interval: 10
default_watchdogs:
watch-oom:
watchdog:
cpufreq_governor: performance
commit: ea2bbe3b9bf930408db205344fe10c8f719ba738
model: Nehalem
nr_cpu: 8
memory: 4G
hdd_partitions: "/dev/disk/by-id/ata-WDC_WD1003FBYZ-010FB0_WD-WCAW36812041-part1"
swap_partitions: "/dev/disk/by-id/ata-WDC_WD1003FBYZ-010FB0_WD-WCAW36812041-part2"
rootfs_partition: "/dev/disk/by-id/ata-WDC_WD1003FBYZ-010FB0_WD-WCAW36812041-part3"
netconsole_port: 6649
iterations: 1x
nr_threads: 32t
disk: 1HDD
fs: xfs
fs2: nfsv4
fsmark:
filesize: 8K
test_size: 400M
sync_method: fsyncBeforeClose
nr_directories: 16d
nr_files_per_directory: 256fpd
testbox: nhm4
tbox_group: nhm4
kconfig: x86_64-rhel
enqueue_time: 2015-02-12 06:18:32.722282365 +08:00
head_commit: ea2bbe3b9bf930408db205344fe10c8f719ba738
base_commit: bfa76d49576599a4b9f9b7a71f23d73d6dcff735
branch: linux-devel/devel-hourly-2015021423
kernel: "/kernel/x86_64-rhel/ea2bbe3b9bf930408db205344fe10c8f719ba738/vmlinuz-3.19.0-gea2bbe3"
user: lkp
queue: cyclic
rootfs: debian-x86_64-2015-02-07.cgz
result_root: "/result/nhm4/fsmark/performance-1x-32t-1HDD-xfs-nfsv4-8K-400M-fsyncBeforeClose-16d-256fpd/debian-x86_64-2015-02-07.cgz/x86_64-rhel/ea2bbe3b9bf930408db205344fe10c8f719ba738/0"
job_file: "/lkp/scheduled/nhm4/cyclic_fsmark-performance-1x-32t-1HDD-xfs-nfsv4-8K-400M-fsyncBeforeClose-16d-256fpd-x86_64-rhel-HEAD-ea2bbe3b9bf930408db205344fe10c8f719ba738-0-20150212-91366-1x0qnc1.yaml"
dequeue_time: 2015-02-15 04:00:09.713437991 +08:00
job_state: finished
loadavg: 22.42 10.60 4.03 1/175 2979
start_time: '1423944037'
end_time: '1423944165'
version: "/lkp/lkp/.src-20150213-094846"
mkfs -t xfs /dev/sda1
mount -t xfs -o nobarrier,inode64 /dev/sda1 /fs/sda1
/etc/init.d/rpcbind start
/etc/init.d/nfs-common start
/etc/init.d/nfs-kernel-server start
mount -t nfs -o vers=4 localhost:/fs/sda1 /nfs/sda1
./fs_mark -d /nfs/sda1/1 -d /nfs/sda1/2 -d /nfs/sda1/3 -d /nfs/sda1/4 -d /nfs/sda1/5 -d /nfs/sda1/6 -d /nfs/sda1/7 -d /nfs/sda1/8 -d /nfs/sda1/9 -d /nfs/sda1/10 -d /nfs/sda1/11 -d /nfs/sda1/12 -d /nfs/sda1/13 -d /nfs/sda1/14 -d /nfs/sda1/15 -d /nfs/sda1/16 -d /nfs/sda1/17 -d /nfs/sda1/18 -d /nfs/sda1/19 -d /nfs/sda1/20 -d /nfs/sda1/21 -d /nfs/sda1/22 -d /nfs/sda1/23 -d /nfs/sda1/24 -d /nfs/sda1/25 -d /nfs/sda1/26 -d /nfs/sda1/27 -d /nfs/sda1/28 -d /nfs/sda1/29 -d /nfs/sda1/30 -d /nfs/sda1/31 -d /nfs/sda1/32 -D 16 -N 256 -n 1600 -L 1 -S 1 -s 8192
_______________________________________________
LKP mailing list
LKP@xxxxxxxxxxxxxxx