Re: [LKP] [SUNRPC] 0472e47660: fsmark.app_overhead 16.0% regression
From: Trond Myklebust
Date: Wed May 29 2019 - 22:04:50 EST
Hi Xing,
On Thu, 2019-05-30 at 09:35 +0800, Xing Zhengjun wrote:
> Hi Trond,
>
> On 5/20/2019 1:54 PM, kernel test robot wrote:
> > Greeting,
> >
> > FYI, we noticed a 16.0% improvement of fsmark.app_overhead due to
> > commit:
> >
> >
> > commit: 0472e476604998c127f3c80d291113e77c5676ac ("SUNRPC: Convert
> > socket page send code to use iov_iter()")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git
> > master
> >
> > in testcase: fsmark
> > on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @
> > 3.00GHz with 384G memory
> > with following parameters:
> >
> > iterations: 1x
> > nr_threads: 64t
> > disk: 1BRD_48G
> > fs: xfs
> > fs2: nfsv4
> > filesize: 4M
> > test_size: 40G
> > sync_method: fsyncBeforeClose
> > cpufreq_governor: performance
> >
> > test-description: The fsmark is a file system benchmark to test
> > synchronous write workloads, for example, mail servers workload.
> > test-url: https://sourceforge.net/projects/fsmark/
> >
> >
> >
> > Details are as below:
> > -----------------------------------------------------------------
> > --------------------------------->
> >
> >
> > To reproduce:
> >
> > git clone https://github.com/intel/lkp-tests.git
> > cd lkp-tests
> > bin/lkp install job.yaml # job file is attached in this
> > email
> > bin/lkp run job.yaml
> >
> > ===================================================================
> > ======================
> > compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/n
> > r_threads/rootfs/sync_method/tbox_group/test_size/testcase:
> > gcc-7/performance/1BRD_48G/4M/nfsv4/xfs/1x/x86_64-rhel-
> > 7.6/64t/debian-x86_64-2018-04-03.cgz/fsyncBeforeClose/lkp-ivb-
> > ep01/40G/fsmark
> >
> > commit:
> > e791f8e938 ("SUNRPC: Convert xs_send_kvec() to use
> > iov_iter_kvec()")
> > 0472e47660 ("SUNRPC: Convert socket page send code to use
> > iov_iter()")
> >
> > e791f8e9380d945e 0472e476604998c127f3c80d291
> > ---------------- ---------------------------
> > fail:runs %reproduction fail:runs
> > | | |
> > :4 50% 2:4 dmesg.WARNING:at#for
> > _ip_interrupt_entry/0x
> > %stddev %change %stddev
> > \ | \
> > 15118573 Â 2% +16.0% 17538083 fsmark.app_overhead
> > 510.93 -22.7% 395.12 fsmark.files_per_sec
> > 24.90 +22.8% 30.57 fsmark.time.elapsed_
> > time
> > 24.90 +22.8% 30.57 fsmark.time.elapsed_
> > time.max
> > 288.00 Â 2% -
> > 27.8% 208.00 fsmark.time.percent_of_cpu_this_job_got
> > 70.03 Â 2% -
> > 11.3% 62.14 fsmark.time.system_time
> > 4391964 -16.7% 3658341 meminfo.max_used_kB
> > 6.10 Â 4% +1.9 7.97
> > Â 3% mpstat.cpu.all.iowait%
> > 0.27 -0.0 0.24 Â 3% mpstat.cpu.all.soft%
> > 13668070 Â 40% +118.0% 29801846 Â 19% numa-
> > numastat.node0.local_node
> > 13677774 Â 40% +117.9% 29810258 Â 19% numa-
> > numastat.node0.numa_hit
> > 5.70 Â 3% +32.1% 7.53 Â 3% iostat.cpu.iowait
> > 16.42 Â 2% -5.8% 15.47 iostat.cpu.system
> > 2.57 -4.1% 2.46 iostat.cpu.user
> > 1406781 Â 2% -15.5% 1188498 vmstat.io.bo
> > 251792 Â 3% -16.6% 209928 vmstat.system.cs
> > 84841 -1.9% 83239 vmstat.system.in
> > 97374502 Â 20% +66.1% 1.617e+08 Â 17% cpuidle.C1E.time
> > 573934 Â 19% +44.6% 829662 Â 26% cpuidle.C1E.usage
> > 5.892e+08 Â 8% +15.3% 6.796e+08 Â 2% cpuidle.C6.time
> > 1968016 Â 3% -15.1% 1670867 Â 3% cpuidle.POLL.time
> > 106420 Â 47% +86.2% 198108 Â 35% numa-
> > meminfo.node0.Active
> > 106037 Â 48% +86.2% 197395 Â 35% numa-
> > meminfo.node0.Active(anon)
> > 105052 Â 48% +86.6% 196037 Â 35% numa-
> > meminfo.node0.AnonPages
> > 212876 Â 24% -41.5% 124572 Â 56% numa-
> > meminfo.node1.Active
> > 211801 Â 24% -41.5% 123822 Â 56% numa-
> > meminfo.node1.Active(anon)
> > 208559 Â 24% -42.2% 120547 Â 57% numa-
> > meminfo.node1.AnonPages
> > 9955 +1.6% 10116 proc-
> > vmstat.nr_kernel_stack
> > 452.25 Â 59% +280.9% 1722 Â100% proc-
> > vmstat.numa_hint_faults_local
> > 33817303 +55.0% 52421773 Â 5% proc-vmstat.numa_hit
> > 33804286 +55.0% 52408807 Â 5% proc-
> > vmstat.numa_local
> > 33923002 +81.8% 61663426 Â 5% proc-
> > vmstat.pgalloc_normal
> > 184765 +9.3% 201985 proc-vmstat.pgfault
> > 12840986 +216.0% 40581327 Â 7% proc-vmstat.pgfree
> > 31447 Â 11% -26.1% 23253 Â
> > 13% sched_debug.cfs_rq:/.min_vruntime.max
> > 4241 Â 3% -12.2% 3724 Â
> > 11% sched_debug.cfs_rq:/.min_vruntime.stddev
> > 20631 Â 11% -36.7% 13069 Â
> > 29% sched_debug.cfs_rq:/.spread0.max
> > 4238 Â 4% -12.1% 3724 Â
> > 11% sched_debug.cfs_rq:/.spread0.stddev
> > 497105 Â 19% -16.0% 417777
> > Â 4% sched_debug.cpu.avg_idle.avg
> > 21199 Â 10% -12.0% 18650
> > Â 3% sched_debug.cpu.nr_load_updates.max
> > 2229 Â 10% -15.0% 1895
> > Â 4% sched_debug.cpu.nr_load_updates.stddev
> > 4.86 Â 5% -23.6% 3.72
> > Â 5% sched_debug.cpu.nr_uninterruptible.stddev
> > 524.75 Â 2% -10.7% 468.50 turbostat.Avg_MHz
> > 5.26 Â 41% -1.6 3.66 Â 2% turbostat.C1%
> > 573633 Â 19% +44.6% 829267 Â 26% turbostat.C1E
> > 8.53 Â 20% +3.4 11.88 Â 17% turbostat.C1E%
> > 76.75 -6.4% 71.86 turbostat.CorWatt
> > 2534071 Â 2% +16.2% 2943968 turbostat.IRQ
> > 2.89 Â 24% +52.9% 4.42 Â 22% turbostat.Pkg%pc2
> > 104.28 -4.6% 99.51 turbostat.PkgWatt
> > 2289 +18.8% 2720 turbostat.SMI
> > 26496 Â 47% +86.2% 49347 Â 35% numa-
> > vmstat.node0.nr_active_anon
> > 26251 Â 48% +86.7% 49008 Â 35% numa-
> > vmstat.node0.nr_anon_pages
> > 26496 Â 47% +86.2% 49347 Â 35% numa-
> > vmstat.node0.nr_zone_active_anon
> > 8272770 Â 35% +99.6% 16515977 Â 17% numa-
> > vmstat.node0.numa_hit
> > 8199681 Â 36% +100.9% 16474282 Â 17% numa-
> > vmstat.node0.numa_local
> > 52953 Â 24% -41.5% 30956 Â 56% numa-
> > vmstat.node1.nr_active_anon
> > 52144 Â 24% -42.2% 30136 Â 57% numa-
> > vmstat.node1.nr_anon_pages
> > 2899 Â 2% -19.4% 2336 Â 10% numa-
> > vmstat.node1.nr_writeback
> > 52953 Â 24% -41.5% 30956 Â 56% numa-
> > vmstat.node1.nr_zone_active_anon
> > 4549 Â 8% -18.9% 3689 Â 7% numa-
> > vmstat.node1.nr_zone_write_pending
> > 43.25 -1.6 41.69 perf-stat.i.cache-
> > miss-rate%
> > 286377 Â 2% -19.1% 231819 Â 2% perf-stat.i.context-
> > switches
> > 2.258e+10 Â 4% -16.2% 1.893e+10 Â 10% perf-stat.i.cpu-
> > cycles
> > 4084 -34.0% 2696 Â 2% perf-stat.i.cpu-
> > migrations
> > 21563675 Â 24% -30.8% 14929445 Â 5% perf-stat.i.dTLB-
> > load-misses
> > 1996341 Â 16% -25.7% 1482826 Â 6% perf-stat.i.dTLB-
> > store-misses
> > 2.313e+09 Â 4% -11.7% 2.042e+09 Â 6% perf-stat.i.dTLB-
> > stores
> > 75.93 -1.3 74.66 perf-stat.i.iTLB-
> > load-miss-rate%
> > 1561381 Â 3% -13.4% 1351731 perf-stat.i.iTLB-
> > load-misses
> > 549643 Â 4% -10.1% 494255 Â 3% perf-stat.i.iTLB-
> > loads
> > 6423 -10.0% 5779 perf-stat.i.minor-
> > faults
> > 31.72 Â 2% +4.4 36.14 perf-stat.i.node-
> > load-miss-rate%
> > 23492890 +23.5% 29020305 Â 3% perf-stat.i.node-
> > load-misses
> > 15.13 Â 8% +7.9 23.02 Â 3% perf-stat.i.node-
> > store-miss-rate%
> > 11517615 +71.3% 19734054 Â 2% perf-stat.i.node-
> > store-misses
> > 6423 -10.0% 5779 perf-stat.i.page-
> > faults
> > 16.88 Â 14% +33.4% 22.52 Â 23% perf-
> > stat.overall.MPKI
> > 46.67 -2.4 44.31 perf-
> > stat.overall.cache-miss-rate%
> > 31.96 +4.8 36.72 perf-
> > stat.overall.node-load-miss-rate%
> > 13.85 +8.0 21.87 perf-
> > stat.overall.node-store-miss-rate%
> > 2.266e+08 +2.8% 2.331e+08 Â 2% perf-stat.ps.cache-
> > references
> > 275634 Â 2% -18.5% 224718 Â 2% perf-
> > stat.ps.context-switches
> > 2.174e+10 Â 4% -15.6% 1.835e+10 Â 10% perf-stat.ps.cpu-
> > cycles
> > 3931 -33.5% 2614 Â 2% perf-stat.ps.cpu-
> > migrations
> > 20746040 Â 24% -30.2% 14476157 Â 5% perf-stat.ps.dTLB-
> > load-misses
> > 1921077 Â 16% -25.2% 1437750 Â 6% perf-stat.ps.dTLB-
> > store-misses
> > 2.227e+09 Â 4% -11.1% 1.979e+09 Â 6% perf-stat.ps.dTLB-
> > stores
> > 1503433 Â 3% -12.8% 1310741 perf-stat.ps.iTLB-
> > load-misses
> > 529058 Â 4% -9.4% 479204 Â 3% perf-stat.ps.iTLB-
> > loads
> > 6200 -9.4% 5620 perf-stat.ps.minor-
> > faults
> > 22613159 +24.4% 28133123 Â 3% perf-stat.ps.node-
> > load-misses
> > 11085254 +72.6% 19131576 Â 2% perf-stat.ps.node-
> > store-misses
> > 6200 -9.4% 5620 perf-stat.ps.page-
> > faults
> > 7008 Â 13% +55.2% 10876 Â 31% softirqs.CPU1.NET_RX
> > 6509 Â 4% +51.8% 9883 Â 27% softirqs.CPU1.RCU
> > 7294 Â 13% +36.7% 9974 Â 8% softirqs.CPU10.RCU
> > 7800 Â 14% +85.5% 14469 Â
> > 23% softirqs.CPU12.NET_RX
> > 5697 Â 43% +77.5% 10110 Â
> > 24% softirqs.CPU13.NET_RX
> > 15944 Â 9% +14.6% 18278 Â 12% softirqs.CPU14.TIMER
> > 6064 Â 19% +68.6% 10223 Â
> > 31% softirqs.CPU15.NET_RX
> > 7796 Â 14% +80.3% 14059 Â
> > 25% softirqs.CPU16.NET_RX
> > 15934 Â 10% +22.1% 19452 Â 11% softirqs.CPU18.TIMER
> > 6725 Â 7% +40.0% 9413 Â
> > 18% softirqs.CPU19.NET_RX
> > 5710 Â 3% +53.4% 8756 Â 12% softirqs.CPU20.RCU
> > 7018 Â 14% +65.5% 11616 Â
> > 40% softirqs.CPU21.NET_RX
> > 6389 Â 18% +66.9% 10666 Â
> > 31% softirqs.CPU23.NET_RX
> > 7259 Â 7% +36.1% 9881 Â 6% softirqs.CPU24.RCU
> > 6491 Â 20% +58.5% 10289 Â
> > 33% softirqs.CPU25.NET_RX
> > 7090 Â 10% +58.7% 11256 Â
> > 29% softirqs.CPU27.NET_RX
> > 6333 Â 27% +70.3% 10786 Â
> > 27% softirqs.CPU29.NET_RX
> > 5813 Â 20% +84.2% 10706 Â 36% softirqs.CPU3.NET_RX
> > 7041 Â 23% +105.7% 14483 Â
> > 21% softirqs.CPU30.NET_RX
> > 6654 Â 7% +64.1% 10918 Â
> > 29% softirqs.CPU31.NET_RX
> > 18019 Â 10% +11.1% 20016 Â 7% softirqs.CPU31.TIMER
> > 4666 Â 8% +104.0% 9518 Â 23% softirqs.CPU32.RCU
> > 15721 Â 16% +35.9% 21371 Â 11% softirqs.CPU32.TIMER
> > 15684 Â 13% +40.0% 21959 Â 18% softirqs.CPU34.TIMER
> > 6489 Â 15% +69.7% 11013 Â
> > 37% softirqs.CPU35.NET_RX
> > 7930 Â 7% +82.1% 14442 Â
> > 29% softirqs.CPU36.NET_RX
> > 15744 Â 14% +24.3% 19563 Â 6% softirqs.CPU36.TIMER
> > 7028 Â 13% +29.3% 9085 Â
> > 11% softirqs.CPU39.NET_RX
> > 7491 Â 22% +100.4% 15011 Â 24% softirqs.CPU4.NET_RX
> > 6119 Â 13% +58.7% 9710 Â 37% softirqs.CPU5.NET_RX
> > 6980 Â 8% +47.8% 10318 Â 42% softirqs.CPU7.NET_RX
> > 285674 +65.7% 473395 softirqs.NET_RX
> > 267950 Â 5% +21.1% 324597 softirqs.RCU
> > 238298 Â 2% +15.1% 274371 Â 2% softirqs.SCHED
> > 689305 Â 3% +14.1% 786236 Â 3% softirqs.TIMER
> > 56196
> > Â 2% +19.9% 67389 interrupts.CPU0.LOC:Local_timer_
> > interrupts
> > 55971
> > Â 2% +19.7% 67005 interrupts.CPU1.LOC:Local_timer_
> > interrupts
> > 5265 Â 17% -32.2% 3568 Â
> > 29% interrupts.CPU1.RES:Rescheduling_interrupts
> > 56163
> > Â 2% +19.2% 66960 interrupts.CPU10.LOC:Local_timer
> > _interrupts
> > 56178
> > Â 2% +19.5% 67139 interrupts.CPU11.LOC:Local_timer
> > _interrupts
> > 4669 Â 15% -30.3% 3253 Â
> > 29% interrupts.CPU11.RES:Rescheduling_interrupts
> > 56081 +19.7% 67142 interrupts.CPU12.LOC
> > :Local_timer_interrupts
> > 55999
> > Â 2% +19.5% 66893 interrupts.CPU13.LOC:Local_timer
> > _interrupts
> > 4746 Â 23% -30.0% 3324 Â
> > 29% interrupts.CPU13.RES:Rescheduling_interrupts
> > 55991
> > Â 2% +19.5% 66898 interrupts.CPU14.LOC:Local_timer
> > _interrupts
> > 55855
> > Â 2% +20.0% 67031 interrupts.CPU15.LOC:Local_timer
> > _interrupts
> > 55943
> > Â 2% +19.9% 67087 interrupts.CPU16.LOC:Local_timer
> > _interrupts
> > 56164 +18.9% 66802 interrupts.CPU17.LOC
> > :Local_timer_interrupts
> > 4579 Â 20% -45.6% 2492 Â
> > 16% interrupts.CPU17.RES:Rescheduling_interrupts
> > 55913
> > Â 2% +19.6% 66860 interrupts.CPU18.LOC:Local_timer
> > _interrupts
> > 55820
> > Â 2% +19.7% 66822 interrupts.CPU19.LOC:Local_timer
> > _interrupts
> > 4733 Â 17% -35.5% 3052 Â
> > 21% interrupts.CPU19.RES:Rescheduling_interrupts
> > 55989
> > Â 2% +20.0% 67177 interrupts.CPU2.LOC:Local_timer_
> > interrupts
> > 55891
> > Â 2% +20.3% 67258 interrupts.CPU20.LOC:Local_timer
> > _interrupts
> > 55966
> > Â 2% +19.6% 66921 interrupts.CPU21.LOC:Local_timer
> > _interrupts
> > 4920 Â 21% -33.4% 3278 Â
> > 31% interrupts.CPU21.RES:Rescheduling_interrupts
> > 55945
> > Â 2% +19.9% 67098 interrupts.CPU22.LOC:Local_timer
> > _interrupts
> > 55945
> > Â 2% +19.9% 67073 interrupts.CPU23.LOC:Local_timer
> > _interrupts
> > 4972 Â 24% -34.1% 3277 Â
> > 36% interrupts.CPU23.RES:Rescheduling_interrupts
> > 56093
> > Â 2% +19.8% 67185 interrupts.CPU24.LOC:Local_timer
> > _interrupts
> > 55910
> > Â 2% +19.7% 66914 interrupts.CPU25.LOC:Local_timer
> > _interrupts
> > 4660 Â 25% -28.9% 3313 Â
> > 34% interrupts.CPU25.RES:Rescheduling_interrupts
> > 56105
> > Â 2% +18.8% 66631 interrupts.CPU26.LOC:Local_timer
> > _interrupts
> > 55928
> > Â 2% +19.5% 66827 interrupts.CPU27.LOC:Local_timer
> > _interrupts
> > 55934
> > Â 2% +19.3% 66740 interrupts.CPU28.LOC:Local_timer
> > _interrupts
> > 55918
> > Â 2% +19.5% 66812 interrupts.CPU29.LOC:Local_timer
> > _interrupts
> > 4825 Â 22% -30.3% 3362 Â
> > 30% interrupts.CPU29.RES:Rescheduling_interrupts
> > 55920
> > Â 2% +19.8% 67004 interrupts.CPU3.LOC:Local_timer_
> > interrupts
> > 56007
> > Â 2% +19.5% 66917 interrupts.CPU30.LOC:Local_timer
> > _interrupts
> > 56238 +18.8% 66838 interrupts.CPU31.LOC
> > :Local_timer_interrupts
> > 4859 Â 21% -33.0% 3257 Â
> > 36% interrupts.CPU31.RES:Rescheduling_interrupts
> > 56099 +19.3% 66921 interrupts.CPU32.LOC
> > :Local_timer_interrupts
> > 55963
> > Â 2% +19.5% 66866 interrupts.CPU33.LOC:Local_timer
> > _interrupts
> > 55992
> > Â 2% +19.2% 66767 interrupts.CPU34.LOC:Local_timer
> > _interrupts
> > 55964
> > Â 2% +19.6% 66939 interrupts.CPU35.LOC:Local_timer
> > _interrupts
> > 4797 Â 26% -33.6% 3185 Â
> > 38% interrupts.CPU35.RES:Rescheduling_interrupts
> > 55770
> > Â 2% +20.2% 67016 interrupts.CPU36.LOC:Local_timer
> > _interrupts
> > 55889 +20.0% 67095 interrupts.CPU37.LOC
> > :Local_timer_interrupts
> > 56140
> > Â 2% +18.9% 66743 interrupts.CPU38.LOC:Local_timer
> > _interrupts
> > 56040
> > Â 2% +19.2% 66828 interrupts.CPU39.LOC:Local_timer
> > _interrupts
> > 5126 Â 23% -40.5% 3052 Â
> > 11% interrupts.CPU39.RES:Rescheduling_interrupts
> > 56022
> > Â 2% +19.9% 67149 interrupts.CPU4.LOC:Local_timer_
> > interrupts
> > 55850
> > Â 2% +20.1% 67066 interrupts.CPU5.LOC:Local_timer_
> > interrupts
> > 4741 Â 23% -31.9% 3227 Â
> > 29% interrupts.CPU5.RES:Rescheduling_interrupts
> > 55912 +19.5% 66798 interrupts.CPU6.LOC:
> > Local_timer_interrupts
> > 56062
> > Â 2% +19.6% 67052 interrupts.CPU7.LOC:Local_timer_
> > interrupts
> > 56118 +19.6% 67142 interrupts.CPU8.LOC:
> > Local_timer_interrupts
> > 56110
> > Â 2% +19.6% 67121 interrupts.CPU9.LOC:Local_timer_
> > interrupts
> > 2216 Â 43% -69.4% 677.50
> > Â112% interrupts.CPU9.NMI:Non-maskable_interrupts
> > 2216 Â 43% -69.4% 677.50
> > Â112% interrupts.CPU9.PMI:Performance_monitoring_interrupts
> > 2240009
> > Â 2% +19.6% 2678940 interrupts.LOC:Local_timer_inter
> > rupts
> > 168296 Â 3% -
> > 10.1% 151314 interrupts.RES:Rescheduling_interrupts
> > 20.37 -3.3 17.04 Â 5% perf-
> > profile.calltrace.cycles-
> > pp.svc_process_common.svc_process.nfsd.kthread.ret_from_fork
> > 20.38 -3.3 17.04 Â 5% perf-
> > profile.calltrace.cycles-pp.svc_process.nfsd.kthread.ret_from_fork
> > 20.27 -3.3 16.95 Â 5% perf-
> > profile.calltrace.cycles-
> > pp.nfsd_dispatch.svc_process_common.svc_process.nfsd.kthread
> > 20.21 -3.3 16.91 Â 5% perf-
> > profile.calltrace.cycles-
> > pp.nfsd4_proc_compound.nfsd_dispatch.svc_process_common.svc_process
> > .nfsd
> > 26.02 -3.1 22.91 Â 4% perf-
> > profile.calltrace.cycles-pp.nfsd.kthread.ret_from_fork
> > 9.04 Â 2% -1.9 7.18 Â 7% perf-
> > profile.calltrace.cycles-
> > pp.nfsd_vfs_write.nfsd4_write.nfsd4_proc_compound.nfsd_dispatch.svc
> > _process_common
> > 9.02 Â 2% -1.9 7.17 Â 7% perf-
> > profile.calltrace.cycles-
> > pp.do_iter_readv_writev.do_iter_write.nfsd_vfs_write.nfsd4_write.nf
> > sd4_proc_compound
> > 9.02 Â 2% -1.9 7.16 Â 7% perf-
> > profile.calltrace.cycles-
> > pp.xfs_file_buffered_aio_write.do_iter_readv_writev.do_iter_write.n
> > fsd_vfs_write.nfsd4_write
> > 9.03 Â 2% -1.9 7.18 Â 7% perf-
> > profile.calltrace.cycles-
> > pp.do_iter_write.nfsd_vfs_write.nfsd4_write.nfsd4_proc_compound.nfs
> > d_dispatch
> > 9.07 Â 2% -1.9 7.22 Â 7% perf-
> > profile.calltrace.cycles-
> > pp.nfsd4_write.nfsd4_proc_compound.nfsd_dispatch.svc_process_common
> > .svc_process
> > 10.50 Â 2% -1.4 9.06 Â 4% perf-
> > profile.calltrace.cycles-
> > pp.xfs_file_fsync.nfsd_commit.nfsd4_proc_compound.nfsd_dispatch.svc
> > _process_common
> > 10.51 Â 2% -1.4 9.08 Â 4% perf-
> > profile.calltrace.cycles-
> > pp.nfsd_commit.nfsd4_proc_compound.nfsd_dispatch.svc_process_common
> > .svc_process
> > 10.45 Â 2% -1.4 9.02 Â 4% perf-
> > profile.calltrace.cycles-
> > pp.file_write_and_wait_range.xfs_file_fsync.nfsd_commit.nfsd4_proc_
> > compound.nfsd_dispatch
> > 9.82 Â 3% -1.4 8.45 Â 3% perf-
> > profile.calltrace.cycles-
> > pp.__filemap_fdatawrite_range.file_write_and_wait_range.xfs_file_fs
> > ync.nfsd_commit.nfsd4_proc_compound
> > 9.82 Â 3% -1.4 8.45 Â 3% perf-
> > profile.calltrace.cycles-
> > pp.do_writepages.__filemap_fdatawrite_range.file_write_and_wait_ran
> > ge.xfs_file_fsync.nfsd_commit
> > 9.82 Â 3% -1.4 8.45 Â 3% perf-
> > profile.calltrace.cycles-
> > pp.xfs_vm_writepages.do_writepages.__filemap_fdatawrite_range.file_
> > write_and_wait_range.xfs_file_fsync
> > 10.61 Â 5% -1.1 9.46 Â 4% perf-
> > profile.calltrace.cycles-pp.write
> > 7.54 -1.1 6.41 Â 6% perf-
> > profile.calltrace.cycles-
> > pp.iomap_apply.iomap_file_buffered_write.xfs_file_buffered_aio_writ
> > e.do_iter_readv_writev.do_iter_write
> > 7.54 -1.1 6.41 Â 6% perf-
> > profile.calltrace.cycles-
> > pp.iomap_file_buffered_write.xfs_file_buffered_aio_write.do_iter_re
> > adv_writev.do_iter_write.nfsd_vfs_write
> > 10.27 Â 5% -1.1 9.15 Â 4% perf-
> > profile.calltrace.cycles-
> > pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> > 7.49 -1.1 6.37 Â 6% perf-
> > profile.calltrace.cycles-
> > pp.iomap_write_actor.iomap_apply.iomap_file_buffered_write.xfs_file
> > _buffered_aio_write.do_iter_readv_writev
> > 10.23 Â 5% -1.1 9.11 Â 4% perf-
> > profile.calltrace.cycles-
> > pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwfram
> > e.write
> > 10.35 Â 5% -1.1 9.23 Â 4% perf-
> > profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
> > 10.08 Â 5% -1.1 8.96 Â 4% perf-
> > profile.calltrace.cycles-
> > pp.__vfs_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_
> > after_hwframe
> > 10.33 Â 5% -1.1 9.22 Â 4% perf-
> > profile.calltrace.cycles-
> > pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> > 10.05 Â 5% -1.1 8.95 Â 4% perf-
> > profile.calltrace.cycles-
> > pp.nfs_file_write.__vfs_write.vfs_write.ksys_write.do_syscall_64
> > 9.80 Â 5% -1.1 8.73 Â 4% perf-
> > profile.calltrace.cycles-
> > pp.generic_perform_write.nfs_file_write.__vfs_write.vfs_write.ksys_
> > write
> > 5.31 Â 5% -1.0 4.27 Â 5% perf-
> > profile.calltrace.cycles-
> > pp.rpc_free_task.rpc_async_release.process_one_work.worker_thread.k
> > thread
> > 5.31 Â 5% -1.0 4.27 Â 5% perf-
> > profile.calltrace.cycles-
> > pp.rpc_async_release.process_one_work.worker_thread.kthread.ret_fro
> > m_fork
> > 4.38 Â 5% -1.0 3.35 Â 2% perf-
> > profile.calltrace.cycles-
> > pp.tcp_recvmsg.inet_recvmsg.svc_recvfrom.svc_tcp_recvfrom.svc_recv
> > 4.38 Â 6% -1.0 3.36 Â 2% perf-
> > profile.calltrace.cycles-
> > pp.inet_recvmsg.svc_recvfrom.svc_tcp_recvfrom.svc_recv.nfsd
> > 4.39 Â 6% -1.0 3.40 Â 2% perf-
> > profile.calltrace.cycles-
> > pp.svc_recvfrom.svc_tcp_recvfrom.svc_recv.nfsd.kthread
> > 3.53 Â 5% -0.9 2.62 Â 3% perf-
> > profile.calltrace.cycles-
> > pp.skb_copy_datagram_iter.tcp_recvmsg.inet_recvmsg.svc_recvfrom.svc
> > _tcp_recvfrom
> > 3.52 Â 5% -0.9 2.62 Â 3% perf-
> > profile.calltrace.cycles-
> > pp.__skb_datagram_iter.skb_copy_datagram_iter.tcp_recvmsg.inet_recv
> > msg.svc_recvfrom
> > 5.01 Â 6% -0.9 4.14 Â 3% perf-
> > profile.calltrace.cycles-pp.fsync
> > 4.09 Â 7% -0.8 3.24 Â 5% perf-
> > profile.calltrace.cycles-
> > pp.nfs_write_completion.rpc_free_task.rpc_async_release.process_one
> > _work.worker_thread
> > 4.90 Â 6% -0.8 4.05 Â 3% perf-
> > profile.calltrace.cycles-
> > pp.__x64_sys_fsync.do_syscall_64.entry_SYSCALL_64_after_hwframe.fsy
> > nc
> > 4.90 Â 6% -0.8 4.05 Â 3% perf-
> > profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.fsync
> > 4.90 Â 6% -0.8 4.05 Â 3% perf-
> > profile.calltrace.cycles-
> > pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.fsync
> > 4.90 Â 6% -0.8 4.05 Â 3% perf-
> > profile.calltrace.cycles-
> > pp.nfs_file_fsync.do_fsync.__x64_sys_fsync.do_syscall_64.entry_SYSC
> > ALL_64_after_hwframe
> > 4.90 Â 6% -0.8 4.05 Â 3% perf-
> > profile.calltrace.cycles-
> > pp.do_fsync.__x64_sys_fsync.do_syscall_64.entry_SYSCALL_64_after_hw
> > frame.fsync
> > 5.84 Â 2% -0.8 5.01 Â 4% perf-
> > profile.calltrace.cycles-
> > pp.write_cache_pages.xfs_vm_writepages.do_writepages.__filemap_fdat
> > awrite_range.file_write_and_wait_range
> > 1.15 Â 7% -0.8 0.36 Â100% perf-
> > profile.calltrace.cycles-
> > pp.rwsem_spin_on_owner.rwsem_down_write_failed.call_rwsem_down_writ
> > e_failed.down_write.xfs_ilock
> > 4.38 Â 7% -0.8 3.59 Â 3% perf-
> > profile.calltrace.cycles-
> > pp.filemap_write_and_wait_range.nfs_file_fsync.do_fsync.__x64_sys_f
> > sync.do_syscall_64
> > 5.17 Â 2% -0.8 4.39 Â 5% perf-
> > profile.calltrace.cycles-
> > pp.xfs_do_writepage.write_cache_pages.xfs_vm_writepages.do_writepag
> > es.__filemap_fdatawrite_range
> > 4.89 Â 4% -0.7 4.18 Â 5% perf-
> > profile.calltrace.cycles-
> > pp.brd_insert_page.brd_do_bvec.brd_make_request.generic_make_reques
> > t.submit_bio
> > 1.38 Â 9% -0.7 0.68 Â 22% perf-
> > profile.calltrace.cycles-
> > pp.down_write.xfs_ilock.xfs_file_buffered_aio_write.do_iter_readv_w
> > ritev.do_iter_write
> > 1.38 Â 9% -0.7 0.68 Â 22% perf-
> > profile.calltrace.cycles-
> > pp.call_rwsem_down_write_failed.down_write.xfs_ilock.xfs_file_buffe
> > red_aio_write.do_iter_readv_writev
> > 1.38 Â 9% -0.7 0.68 Â 22% perf-
> > profile.calltrace.cycles-
> > pp.rwsem_down_write_failed.call_rwsem_down_write_failed.down_write.
> > xfs_ilock.xfs_file_buffered_aio_write
> > 1.38 Â 9% -0.7 0.68 Â 22% perf-
> > profile.calltrace.cycles-
> > pp.xfs_ilock.xfs_file_buffered_aio_write.do_iter_readv_writev.do_it
> > er_write.nfsd_vfs_write
> > 4.01 Â 3% -0.6 3.40 Â 5% perf-
> > profile.calltrace.cycles-
> > pp.brd_make_request.generic_make_request.submit_bio.xfs_add_to_ioen
> > d.xfs_do_writepage
> > 4.01 Â 3% -0.6 3.40 Â 5% perf-
> > profile.calltrace.cycles-
> > pp.submit_bio.xfs_add_to_ioend.xfs_do_writepage.write_cache_pages.x
> > fs_vm_writepages
> > 4.01 Â 3% -0.6 3.40 Â 5% perf-
> > profile.calltrace.cycles-
> > pp.generic_make_request.submit_bio.xfs_add_to_ioend.xfs_do_writepag
> > e.write_cache_pages
> > 3.99 Â 3% -0.6 3.38 Â 5% perf-
> > profile.calltrace.cycles-
> > pp.brd_do_bvec.brd_make_request.generic_make_request.submit_bio.xfs
> > _add_to_ioend
> > 4.11 Â 3% -0.6 3.51 Â 4% perf-
> > profile.calltrace.cycles-
> > pp.xfs_add_to_ioend.xfs_do_writepage.write_cache_pages.xfs_vm_write
> > pages.do_writepages
> > 3.97 Â 5% -0.5 3.43 Â 4% perf-
> > profile.calltrace.cycles-
> > pp.submit_bio.xfs_submit_ioend.xfs_vm_writepages.do_writepages.__fi
> > lemap_fdatawrite_range
> > 3.97 Â 5% -0.5 3.43 Â 4% perf-
> > profile.calltrace.cycles-
> > pp.generic_make_request.submit_bio.xfs_submit_ioend.xfs_vm_writepag
> > es.do_writepages
> > 3.98 Â 5% -0.5 3.44 Â 4% perf-
> > profile.calltrace.cycles-
> > pp.xfs_submit_ioend.xfs_vm_writepages.do_writepages.__filemap_fdata
> > write_range.file_write_and_wait_range
> > 3.96 Â 5% -0.5 3.43 Â 4% perf-
> > profile.calltrace.cycles-
> > pp.brd_make_request.generic_make_request.submit_bio.xfs_submit_ioen
> > d.xfs_vm_writepages
> > 3.92 Â 5% -0.5 3.40 Â 4% perf-
> > profile.calltrace.cycles-
> > pp.brd_do_bvec.brd_make_request.generic_make_request.submit_bio.xfs
> > _submit_ioend
> > 2.89 Â 6% -0.5 2.38 Â 6% perf-
> > profile.calltrace.cycles-
> > pp.nfs_end_page_writeback.nfs_write_completion.rpc_free_task.rpc_as
> > ync_release.process_one_work
> > 2.77 Â 10% -0.5 2.28 Â 2% perf-
> > profile.calltrace.cycles-
> > pp.__filemap_fdatawrite_range.filemap_write_and_wait_range.nfs_file
> > _fsync.do_fsync.__x64_sys_fsync
> > 2.77 Â 10% -0.5 2.28 Â 2% perf-
> > profile.calltrace.cycles-
> > pp.nfs_writepages.do_writepages.__filemap_fdatawrite_range.filemap_
> > write_and_wait_range.nfs_file_fsync
> > 2.77 Â 10% -0.5 2.28 Â 2% perf-
> > profile.calltrace.cycles-
> > pp.do_writepages.__filemap_fdatawrite_range.filemap_write_and_wait_
> > range.nfs_file_fsync.do_fsync
> > 2.74 Â 10% -0.5 2.25 Â 2% perf-
> > profile.calltrace.cycles-
> > pp.write_cache_pages.nfs_writepages.do_writepages.__filemap_fdatawr
> > ite_range.filemap_write_and_wait_range
> > 3.12 Â 7% -0.5 2.64 Â 4% perf-
> > profile.calltrace.cycles-
> > pp.nfs_write_begin.generic_perform_write.nfs_file_write.__vfs_write
> > .vfs_write
> > 2.93 Â 7% -0.5 2.47 Â 3% perf-
> > profile.calltrace.cycles-
> > pp.grab_cache_page_write_begin.nfs_write_begin.generic_perform_writ
> > e.nfs_file_write.__vfs_write
> > 3.37 Â 4% -0.5 2.91 Â 4% perf-
> > profile.calltrace.cycles-
> > pp.nfs_write_end.generic_perform_write.nfs_file_write.__vfs_write.v
> > fs_write
> > 2.51 Â 4% -0.4 2.06 Â 7% perf-
> > profile.calltrace.cycles-
> > pp.clear_page_erms.get_page_from_freelist.__alloc_pages_nodemask.br
> > d_insert_page.brd_do_bvec
> > 3.12 Â 5% -0.4 2.68 Â 7% perf-
> > profile.calltrace.cycles-
> > pp.__alloc_pages_nodemask.brd_insert_page.brd_do_bvec.brd_make_requ
> > est.generic_make_request
> > 2.91 Â 5% -0.4 2.48 Â 3% perf-
> > profile.calltrace.cycles-
> > pp._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.tcp_rec
> > vmsg.inet_recvmsg
> > 3.02 Â 5% -0.4 2.59 Â 7% perf-
> > profile.calltrace.cycles-
> > pp.get_page_from_freelist.__alloc_pages_nodemask.brd_insert_page.br
> > d_do_bvec.brd_make_request
> > 2.59 Â 4% -0.4 2.18 Â 5% perf-
> > profile.calltrace.cycles-
> > pp.nfs_updatepage.nfs_write_end.generic_perform_write.nfs_file_writ
> > e.__vfs_write
> > 2.84 Â 7% -0.4 2.43 Â 3% perf-
> > profile.calltrace.cycles-
> > pp.pagecache_get_page.grab_cache_page_write_begin.nfs_write_begin.g
> > eneric_perform_write.nfs_file_write
> > 3.27 -0.4 2.86 Â 6% perf-
> > profile.calltrace.cycles-
> > pp.iov_iter_copy_from_user_atomic.iomap_write_actor.iomap_apply.iom
> > ap_file_buffered_write.xfs_file_buffered_aio_write
> > 3.20 -0.4 2.79 Â 6% perf-
> > profile.calltrace.cycles-
> > pp.memcpy_erms.iov_iter_copy_from_user_atomic.iomap_write_actor.iom
> > ap_apply.iomap_file_buffered_write
> > 2.85 Â 5% -0.4 2.44 Â 3% perf-
> > profile.calltrace.cycles-
> > pp.memcpy_erms._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_
> > iter.tcp_recvmsg
> > 2.50 Â 4% -0.4 2.14 Â 7% perf-
> > profile.calltrace.cycles-
> > pp.iomap_write_begin.iomap_write_actor.iomap_apply.iomap_file_buffe
> > red_write.xfs_file_buffered_aio_write
> > 2.41 Â 4% -0.4 2.06 Â 7% perf-
> > profile.calltrace.cycles-
> > pp.grab_cache_page_write_begin.iomap_write_begin.iomap_write_actor.
> > iomap_apply.iomap_file_buffered_write
> > 2.02 Â 11% -0.3 1.67 Â 3% perf-
> > profile.calltrace.cycles-
> > pp.nfs_writepages_callback.write_cache_pages.nfs_writepages.do_writ
> > epages.__filemap_fdatawrite_range
> > 0.60 Â 8% -0.3 0.26 Â100% perf-
> > profile.calltrace.cycles-
> > pp.iomap_set_page_dirty.iomap_write_end.iomap_write_actor.iomap_app
> > ly.iomap_file_buffered_write
> > 1.96 Â 11% -0.3 1.62 Â 3% perf-
> > profile.calltrace.cycles-
> > pp.nfs_do_writepage.nfs_writepages_callback.write_cache_pages.nfs_w
> > ritepages.do_writepages
> > 1.55 Â 2% -0.3 1.22 Â 5% perf-
> > profile.calltrace.cycles-
> > pp.iomap_write_end.iomap_write_actor.iomap_apply.iomap_file_buffere
> > d_write.xfs_file_buffered_aio_write
> > 2.35 Â 4% -0.3 2.03 Â 7% perf-
> > profile.calltrace.cycles-
> > pp.pagecache_get_page.grab_cache_page_write_begin.iomap_write_begin
> > .iomap_write_actor.iomap_apply
> > 1.60 Â 2% -0.3 1.32 Â 4% perf-
> > profile.calltrace.cycles-
> > pp.__filemap_fdatawait_range.filemap_write_and_wait_range.nfs_file_
> > fsync.do_fsync.__x64_sys_fsync
> > 1.49 Â 6% -0.2 1.24 Â 6% perf-
> > profile.calltrace.cycles-
> > pp._raw_spin_lock.brd_insert_page.brd_do_bvec.brd_make_request.gene
> > ric_make_request
> > 1.03 Â 6% -0.2 0.80 Â 7% perf-
> > profile.calltrace.cycles-
> > pp.end_page_writeback.nfs_end_page_writeback.nfs_write_completion.r
> > pc_free_task.rpc_async_release
> > 0.97 Â 7% -0.2 0.74 Â 9% perf-
> > profile.calltrace.cycles-
> > pp.test_clear_page_writeback.end_page_writeback.nfs_end_page_writeb
> > ack.nfs_write_completion.rpc_free_task
> > 1.45 Â 6% -0.2 1.23 Â 4% perf-
> > profile.calltrace.cycles-
> > pp.add_to_page_cache_lru.pagecache_get_page.grab_cache_page_write_b
> > egin.nfs_write_begin.generic_perform_write
> > 0.97 Â 4% -0.2 0.77 Â 3% perf-
> > profile.calltrace.cycles-
> > pp.poll_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_se
> > condary
> > 1.22 Â 6% -0.2 1.03 Â 6% perf-
> > profile.calltrace.cycles-
> > pp.nfs_commit_release_pages.nfs_commit_release.rpc_free_task.rpc_as
> > ync_release.process_one_work
> > 1.22 Â 6% -0.2 1.03 Â 5% perf-
> > profile.calltrace.cycles-
> > pp.nfs_commit_release.rpc_free_task.rpc_async_release.process_one_w
> > ork.worker_thread
> > 1.25 Â 5% -0.2 1.06 Â 8% perf-
> > profile.calltrace.cycles-
> > pp.add_to_page_cache_lru.pagecache_get_page.grab_cache_page_write_b
> > egin.iomap_write_begin.iomap_write_actor
> > 1.06 Â 2% -0.2 0.87 Â 4% perf-
> > profile.calltrace.cycles-
> > pp.wait_on_page_bit_common.__filemap_fdatawait_range.filemap_write_
> > and_wait_range.nfs_file_fsync.do_fsync
> > 1.27 Â 5% -0.2 1.11 Â 6% perf-
> > profile.calltrace.cycles-
> > pp.wake_up_page_bit.nfs_end_page_writeback.nfs_write_completion.rpc
> > _free_task.rpc_async_release
> > 0.78 Â 8% -0.2 0.63 Â 6% perf-
> > profile.calltrace.cycles-
> > pp.__set_page_dirty_nobuffers.nfs_updatepage.nfs_write_end.generic_
> > perform_write.nfs_file_write
> > 1.11 Â 7% -0.1 0.99 Â 6% perf-
> > profile.calltrace.cycles-
> > pp.__wake_up_common.wake_up_page_bit.nfs_end_page_writeback.nfs_wri
> > te_completion.rpc_free_task
> > 0.68 -0.1 0.57 Â 5% perf-
> > profile.calltrace.cycles-
> > pp.nfs_create_request.nfs_updatepage.nfs_write_end.generic_perform_
> > write.nfs_file_write
> > 1.04 Â 7% -0.1 0.93 Â 5% perf-
> > profile.calltrace.cycles-
> > pp.autoremove_wake_function.__wake_up_common.wake_up_page_bit.nfs_e
> > nd_page_writeback.nfs_write_completion
> > 1.02 Â 8% -0.1 0.92 Â 4% perf-
> > profile.calltrace.cycles-
> > pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.wake_up
> > _page_bit.nfs_end_page_writeback
> > 0.71 Â 2% -0.1 0.60 Â 8% perf-
> > profile.calltrace.cycles-
> > pp.__schedule.schedule.io_schedule.wait_on_page_bit_common.__filema
> > p_fdatawait_range
> > 0.72 Â 2% -0.1 0.61 Â 8% perf-
> > profile.calltrace.cycles-
> > pp.schedule.io_schedule.wait_on_page_bit_common.__filemap_fdatawait
> > _range.filemap_write_and_wait_range
> > 0.72 -0.1 0.62 Â 7% perf-
> > profile.calltrace.cycles-
> > pp.io_schedule.wait_on_page_bit_common.__filemap_fdatawait_range.fi
> > lemap_write_and_wait_range.nfs_file_fsync
> > 0.93 Â 5% -0.1 0.83 Â 5% perf-
> > profile.calltrace.cycles-
> > pp.__alloc_pages_nodemask.pagecache_get_page.grab_cache_page_write_
> > begin.nfs_write_begin.generic_perform_write
> > 0.73 -0.1 0.67 Â 5% perf-
> > profile.calltrace.cycles-
> > pp.end_page_writeback.xfs_destroy_ioend.process_one_work.worker_thr
> > ead.kthread
> > 4.86 Â 6% +0.5 5.39 perf-
> > profile.calltrace.cycles-pp.svc_recv.nfsd.kthread.ret_from_fork
> > 0.00 +0.5 0.54 Â 3% perf-
> > profile.calltrace.cycles-
> > pp.get_page_from_freelist.__alloc_pages_nodemask.svc_recv.nfsd.kthr
> > ead
> > 0.00 +0.6 0.59 Â 6% perf-
> > profile.calltrace.cycles-
> > pp.ip6_xmit.inet6_csk_xmit.__tcp_transmit_skb.tcp_write_xmit.tcp_se
> > ndmsg_locked
> > 0.00 +0.6 0.63 Â 6% perf-
> > profile.calltrace.cycles-
> > pp.inet6_csk_xmit.__tcp_transmit_skb.tcp_write_xmit.tcp_sendmsg_loc
> > ked.tcp_sendmsg
> > 0.00 +0.7 0.65 Â 4% perf-
> > profile.calltrace.cycles-
> > pp.__alloc_pages_nodemask.svc_recv.nfsd.kthread.ret_from_fork
> > 0.00 +0.7 0.74 Â 8% perf-
> > profile.calltrace.cycles-
> > pp.__tcp_transmit_skb.tcp_write_xmit.tcp_sendmsg_locked.tcp_sendmsg
> > .sock_sendmsg
> > 0.00 +0.9 0.86 Â 8% perf-
> > profile.calltrace.cycles-
> > pp.tcp_write_xmit.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_se
> > ndpages
> > 9.98 Â 4% +2.5 12.44 Â 4% perf-
> > profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork
> > 9.61 Â 4% +2.5 12.15 Â 4% perf-
> > profile.calltrace.cycles-
> > pp.process_one_work.worker_thread.kthread.ret_from_fork
> > 47.02 Â 3% +2.7 49.68 Â 4% perf-
> > profile.calltrace.cycles-pp.secondary_startup_64
> > 0.00 +3.5 3.54 Â 3% perf-
> > profile.calltrace.cycles-
> > pp.memcpy_erms.memcpy_from_page._copy_from_iter_full.tcp_sendmsg_lo
> > cked.tcp_sendmsg
> > 0.00 +3.6 3.56 Â 3% perf-
> > profile.calltrace.cycles-
> > pp.memcpy_from_page._copy_from_iter_full.tcp_sendmsg_locked.tcp_sen
> > dmsg.sock_sendmsg
> > 0.00 +3.7 3.67 Â 3% perf-
> > profile.calltrace.cycles-
> > pp._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg
> > .xs_sendpages
> > 2.29 Â 6% +3.8 6.05 Â 3% perf-
> > profile.calltrace.cycles-
> > pp.rpc_async_schedule.process_one_work.worker_thread.kthread.ret_fr
> > om_fork
> > 2.28 Â 6% +3.8 6.05 Â 3% perf-
> > profile.calltrace.cycles-
> > pp.__rpc_execute.rpc_async_schedule.process_one_work.worker_thread.
> > kthread
> > 1.86 Â 3% +3.8 5.64 Â 4% perf-
> > profile.calltrace.cycles-
> > pp.call_transmit.__rpc_execute.rpc_async_schedule.process_one_work.
> > worker_thread
> > 1.85 Â 3% +3.8 5.64 Â 4% perf-
> > profile.calltrace.cycles-
> > pp.xprt_transmit.call_transmit.__rpc_execute.rpc_async_schedule.pro
> > cess_one_work
> > 1.80 Â 4% +3.8 5.60 Â 4% perf-
> > profile.calltrace.cycles-
> > pp.xs_tcp_send_request.xprt_transmit.call_transmit.__rpc_execute.rp
> > c_async_schedule
> > 1.79 Â 3% +3.8 5.60 Â 4% perf-
> > profile.calltrace.cycles-
> > pp.xs_sendpages.xs_tcp_send_request.xprt_transmit.call_transmit.__r
> > pc_execute
> > 0.00 +5.5 5.47 Â 4% perf-
> > profile.calltrace.cycles-
> > pp.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages.xs_tcp_
> > send_request
> > 0.00 +5.6 5.56 Â 4% perf-
> > profile.calltrace.cycles-
> > pp.tcp_sendmsg.sock_sendmsg.xs_sendpages.xs_tcp_send_request.xprt_t
> > ransmit
> > 0.00 +5.6 5.57 Â 4% perf-
> > profile.calltrace.cycles-
> > pp.sock_sendmsg.xs_sendpages.xs_tcp_send_request.xprt_transmit.call
> > _transmit
> > 20.37 -3.3 17.04 Â 5% perf-
> > profile.children.cycles-pp.svc_process_common
> > 20.38 -3.3 17.04 Â 5% perf-
> > profile.children.cycles-pp.svc_process
> > 20.27 -3.3 16.95 Â 5% perf-
> > profile.children.cycles-pp.nfsd_dispatch
> > 20.21 -3.3 16.91 Â 5% perf-
> > profile.children.cycles-pp.nfsd4_proc_compound
> > 26.02 -3.1 22.91 Â 4% perf-
> > profile.children.cycles-pp.nfsd
> > 15.88 Â 5% -1.9 13.95 Â 3% perf-
> > profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> > 15.87 Â 5% -1.9 13.95 Â 4% perf-
> > profile.children.cycles-pp.do_syscall_64
> > 12.61 Â 3% -1.9 10.73 Â 3% perf-
> > profile.children.cycles-pp.__filemap_fdatawrite_range
> > 12.60 Â 3% -1.9 10.73 Â 3% perf-
> > profile.children.cycles-pp.do_writepages
> > 9.04 Â 2% -1.9 7.18 Â 7% perf-
> > profile.children.cycles-pp.nfsd_vfs_write
> > 9.02 Â 2% -1.9 7.17 Â 7% perf-
> > profile.children.cycles-pp.do_iter_readv_writev
> > 9.02 Â 2% -1.9 7.16 Â 7% perf-
> > profile.children.cycles-pp.xfs_file_buffered_aio_write
> > 9.03 Â 2% -1.9 7.18 Â 7% perf-
> > profile.children.cycles-pp.do_iter_write
> > 9.07 Â 2% -1.9 7.22 Â 7% perf-
> > profile.children.cycles-pp.nfsd4_write
> > 2.21 Â 4% -1.8 0.41 Â 4% perf-
> > profile.children.cycles-pp.inet_sendpage
> > 2.17 Â 4% -1.8 0.40 Â 5% perf-
> > profile.children.cycles-pp.tcp_sendpage
> > 10.50 Â 2% -1.4 9.06 Â 4% perf-
> > profile.children.cycles-pp.xfs_file_fsync
> > 10.51 Â 2% -1.4 9.08 Â 4% perf-
> > profile.children.cycles-pp.nfsd_commit
> > 10.45 Â 2% -1.4 9.02 Â 4% perf-
> > profile.children.cycles-pp.file_write_and_wait_range
> > 9.82 Â 3% -1.4 8.45 Â 3% perf-
> > profile.children.cycles-pp.xfs_vm_writepages
> > 8.59 Â 4% -1.3 7.27 Â 4% perf-
> > profile.children.cycles-pp.write_cache_pages
> > 8.02 Â 4% -1.2 6.87 Â 4% perf-
> > profile.children.cycles-pp.submit_bio
> > 8.02 Â 4% -1.2 6.87 Â 4% perf-
> > profile.children.cycles-pp.generic_make_request
> > 8.00 Â 4% -1.1 6.85 Â 4% perf-
> > profile.children.cycles-pp.brd_make_request
> > 7.94 Â 4% -1.1 6.79 Â 4% perf-
> > profile.children.cycles-pp.brd_do_bvec
> > 10.63 Â 5% -1.1 9.49 Â 4% perf-
> > profile.children.cycles-pp.write
> > 7.55 -1.1 6.41 Â 5% perf-
> > profile.children.cycles-pp.iomap_apply
> > 7.54 -1.1 6.41 Â 6% perf-
> > profile.children.cycles-pp.iomap_file_buffered_write
> > 10.28 Â 5% -1.1 9.16 Â 4% perf-
> > profile.children.cycles-pp.ksys_write
> > 7.50 -1.1 6.37 Â 6% perf-
> > profile.children.cycles-pp.iomap_write_actor
> > 4.68 Â 5% -1.1 3.56 Â 2% perf-
> > profile.children.cycles-pp.inet_recvmsg
> > 4.68 Â 5% -1.1 3.56 Â 2% perf-
> > profile.children.cycles-pp.tcp_recvmsg
> > 10.24 Â 5% -1.1 9.12 Â 4% perf-
> > profile.children.cycles-pp.vfs_write
> > 10.08 Â 5% -1.1 8.98 Â 4% perf-
> > profile.children.cycles-pp.__vfs_write
> > 10.05 Â 5% -1.1 8.95 Â 4% perf-
> > profile.children.cycles-pp.nfs_file_write
> > 5.51 Â 5% -1.1 4.43 Â 5% perf-
> > profile.children.cycles-pp.rpc_free_task
> > 9.82 Â 5% -1.1 8.74 Â 4% perf-
> > profile.children.cycles-pp.generic_perform_write
> > 1.39 Â 5% -1.1 0.32 Â 4% perf-
> > profile.children.cycles-pp.tcp_sendpage_locked
> > 1.36 Â 5% -1.1 0.31 Â 5% perf-
> > profile.children.cycles-pp.do_tcp_sendpages
> > 5.31 Â 5% -1.0 4.27 Â 5% perf-
> > profile.children.cycles-pp.rpc_async_release
> > 4.39 Â 6% -1.0 3.40 Â 2% perf-
> > profile.children.cycles-pp.svc_recvfrom
> > 3.55 Â 5% -0.9 2.64 Â 2% perf-
> > profile.children.cycles-pp.__skb_datagram_iter
> > 3.55 Â 5% -0.9 2.64 Â 2% perf-
> > profile.children.cycles-pp.skb_copy_datagram_iter
> > 5.01 Â 6% -0.9 4.15 Â 3% perf-
> > profile.children.cycles-pp.fsync
> > 4.09 Â 7% -0.9 3.24 Â 5% perf-
> > profile.children.cycles-pp.nfs_write_completion
> > 4.90 Â 6% -0.8 4.05 Â 3% perf-
> > profile.children.cycles-pp.__x64_sys_fsync
> > 4.91 Â 6% -0.8 4.07 Â 3% perf-
> > profile.children.cycles-pp.nfs_file_fsync
> > 4.90 Â 6% -0.8 4.05 Â 3% perf-
> > profile.children.cycles-pp.do_fsync
> > 5.36 Â 4% -0.8 4.53 Â 5% perf-
> > profile.children.cycles-pp.grab_cache_page_write_begin
> > 4.39 Â 7% -0.8 3.60 Â 2% perf-
> > profile.children.cycles-pp.filemap_write_and_wait_range
> > 5.18 Â 2% -0.8 4.40 Â 5% perf-
> > profile.children.cycles-pp.xfs_do_writepage
> > 5.21 Â 4% -0.7 4.47 Â 5% perf-
> > profile.children.cycles-pp.pagecache_get_page
> > 4.90 Â 4% -0.7 4.19 Â 5% perf-
> > profile.children.cycles-pp.brd_insert_page
> > 1.38 Â 9% -0.7 0.68 Â 22% perf-
> > profile.children.cycles-pp.xfs_ilock
> > 1.41 Â 8% -0.7 0.71 Â 22% perf-
> > profile.children.cycles-pp.call_rwsem_down_write_failed
> > 1.41 Â 8% -0.7 0.71 Â 22% perf-
> > profile.children.cycles-pp.rwsem_down_write_failed
> > 1.44 Â 7% -0.7 0.74 Â 21% perf-
> > profile.children.cycles-pp.down_write
> > 4.12 Â 3% -0.6 3.52 Â 4% perf-
> > profile.children.cycles-pp.xfs_add_to_ioend
> > 1.18 Â 6% -0.6 0.61 Â 21% perf-
> > profile.children.cycles-pp.rwsem_spin_on_owner
> > 3.98 Â 5% -0.5 3.44 Â 4% perf-
> > profile.children.cycles-pp.xfs_submit_ioend
> > 6.25 -0.5 5.72 Â 4% perf-
> > profile.children.cycles-pp.iov_iter_copy_from_user_atomic
> > 2.89 Â 6% -0.5 2.38 Â 6% perf-
> > profile.children.cycles-pp.nfs_end_page_writeback
> > 2.78 Â 10% -0.5 2.29 Â 2% perf-
> > profile.children.cycles-pp.nfs_writepages
> > 3.12 Â 7% -0.5 2.64 Â 4% perf-
> > profile.children.cycles-pp.nfs_write_begin
> > 3.37 Â 4% -0.5 2.91 Â 4% perf-
> > profile.children.cycles-pp.nfs_write_end
> > 2.54 Â 4% -0.5 2.08 Â 7% perf-
> > profile.children.cycles-pp.clear_page_erms
> > 2.92 Â 5% -0.4 2.48 Â 3% perf-
> > profile.children.cycles-pp._copy_to_iter
> > 2.60 Â 4% -0.4 2.18 Â 5% perf-
> > profile.children.cycles-pp.nfs_updatepage
> > 2.71 Â 4% -0.4 2.29 Â 6% perf-
> > profile.children.cycles-pp.add_to_page_cache_lru
> > 0.75 Â 3% -0.4 0.37 Â 6% perf-
> > profile.children.cycles-pp.release_sock
> > 2.51 Â 4% -0.4 2.15 Â 7% perf-
> > profile.children.cycles-pp.iomap_write_begin
> > 2.02 Â 10% -0.3 1.67 Â 3% perf-
> > profile.children.cycles-pp.nfs_writepages_callback
> > 2.23 Â 2% -0.3 1.89 Â 4% perf-
> > profile.children.cycles-pp.__filemap_fdatawait_range
> > 0.82 Â 7% -0.3 0.47 Â 7% perf-
> > profile.children.cycles-pp.__tcp_push_pending_frames
> > 1.96 Â 11% -0.3 1.63 Â 3% perf-
> > profile.children.cycles-pp.nfs_do_writepage
> > 1.55 Â 2% -0.3 1.22 Â 5% perf-
> > profile.children.cycles-pp.iomap_write_end
> > 0.78 Â 7% -0.3 0.48 Â 6% perf-
> > profile.children.cycles-pp.svc_send
> > 1.78 Â 3% -0.3 1.48 Â 4% perf-
> > profile.children.cycles-pp.end_page_writeback
> > 1.73 Â 3% -0.3 1.45 Â 4% perf-
> > profile.children.cycles-pp.test_clear_page_writeback
> > 0.90 Â 5% -0.3 0.63 Â 8% perf-
> > profile.children.cycles-pp.xas_load
> > 0.35 Â 14% -0.2 0.10 Â 14% perf-
> > profile.children.cycles-pp.simple_copy_to_iter
> > 1.78 Â 4% -0.2 1.54 Â 7% perf-
> > profile.children.cycles-pp.__wake_up_common
> > 0.43 Â 6% -0.2 0.19 Â 7% perf-
> > profile.children.cycles-pp.lock_sock_nested
> > 1.41 Â 5% -0.2 1.18 Â 4% perf-
> > profile.children.cycles-pp.nfs_commit_release
> > 1.41 Â 5% -0.2 1.18 Â 4% perf-
> > profile.children.cycles-pp.nfs_commit_release_pages
> > 1.95 Â 4% -0.2 1.72 Â 5% perf-
> > profile.children.cycles-pp.try_to_wake_up
> > 0.64 Â 8% -0.2 0.41 Â 4% perf-
> > profile.children.cycles-pp.svc_tcp_sendto
> > 0.63 Â 8% -0.2 0.41 Â 5% perf-
> > profile.children.cycles-pp.svc_sendto
> > 0.62 Â 8% -0.2 0.41 Â 5% perf-
> > profile.children.cycles-pp.svc_send_common
> > 0.62 Â 8% -0.2 0.41 Â 5% perf-
> > profile.children.cycles-pp.kernel_sendpage
> > 1.65 Â 5% -0.2 1.44 Â 7% perf-
> > profile.children.cycles-pp.autoremove_wake_function
> > 1.34 Â 3% -0.2 1.14 Â 3% perf-
> > profile.children.cycles-pp.wait_on_page_bit_common
> > 0.98 Â 4% -0.2 0.79 Â 2% perf-
> > profile.children.cycles-pp.poll_idle
> > 0.26 Â 8% -0.2 0.08 Â 13% perf-
> > profile.children.cycles-pp._raw_spin_lock_bh
> > 1.59 Â 5% -0.2 1.42 Â 5% perf-
> > profile.children.cycles-pp.schedule
> > 0.79 Â 5% -0.2 0.62 Â 4% perf-
> > profile.children.cycles-pp.tcp_v6_do_rcv
> > 0.64 Â 5% -0.2 0.47 Â 8% perf-
> > profile.children.cycles-pp.tcp_v6_rcv
> > 0.34 Â 12% -0.2 0.18 Â 20% perf-
> > profile.children.cycles-pp.xas_start
> > 0.24 Â 26% -0.2 0.08 Â 40% perf-
> > profile.children.cycles-pp.osq_lock
> > 0.76 Â 5% -0.2 0.61 Â 5% perf-
> > profile.children.cycles-pp.tcp_rcv_established
> > 0.49 Â 26% -0.2 0.34 Â 14% perf-
> > profile.children.cycles-pp.nfs_request_add_commit_list
> > 0.66 Â 5% -0.1 0.52 Â 8% perf-
> > profile.children.cycles-pp.ip6_input
> > 0.66 Â 4% -0.1 0.51 Â 8% perf-
> > profile.children.cycles-pp.ip6_input_finish
> > 0.65 Â 4% -0.1 0.51 Â 9% perf-
> > profile.children.cycles-pp.ip6_protocol_deliver_rcu
> > 0.79 Â 8% -0.1 0.64 Â 7% perf-
> > profile.children.cycles-pp.__set_page_dirty_nobuffers
> > 0.83 Â 6% -0.1 0.69 Â 8% perf-
> > profile.children.cycles-pp.__nfs_commit_inode
> > 0.42 Â 4% -0.1 0.29 Â 5% perf-
> > profile.children.cycles-pp.xs_stream_data_receive_workfn
> > 0.42 Â 4% -0.1 0.29 Â 5% perf-
> > profile.children.cycles-pp.xs_stream_data_receive
> > 0.70 Â 4% -0.1 0.56 Â 6% perf-
> > profile.children.cycles-pp.percpu_counter_add_batch
> > 0.71 Â 4% -0.1 0.58 Â 8% perf-
> > profile.children.cycles-pp.ipv6_rcv
> > 0.86 Â 6% -0.1 0.73 Â 6% perf-
> > profile.children.cycles-pp.__local_bh_enable_ip
> > 0.78 Â 5% -0.1 0.65 Â 8% perf-
> > profile.children.cycles-pp.process_backlog
> > 0.45 Â 14% -0.1 0.32 Â 14% perf-
> > profile.children.cycles-pp.__mutex_lock
> > 1.06 Â 4% -0.1 0.93 Â 7% perf-
> > profile.children.cycles-pp._raw_spin_lock_irqsave
> > 0.86 Â 6% -0.1 0.73 Â 9% perf-
> > profile.children.cycles-pp.pagevec_lru_move_fn
> > 0.75 Â 5% -0.1 0.63 Â 8% perf-
> > profile.children.cycles-pp.__netif_receive_skb_one_core
> > 0.61 Â 8% -0.1 0.49 Â 9% perf-
> > profile.children.cycles-pp.iomap_set_page_dirty
> > 0.94 Â 5% -0.1 0.81 Â 8% perf-
> > profile.children.cycles-pp.__lru_cache_add
> > 0.82 Â 6% -0.1 0.70 Â 2% perf-
> > profile.children.cycles-pp.clear_page_dirty_for_io
> > 0.80 Â 6% -0.1 0.68 Â 7% perf-
> > profile.children.cycles-pp.net_rx_action
> > 0.94 Â 2% -0.1 0.82 Â 3% perf-
> > profile.children.cycles-pp.io_schedule
> > 0.81 Â 7% -0.1 0.70 Â 7% perf-
> > profile.children.cycles-pp.do_softirq_own_stack
> > 0.69 -0.1 0.58 Â 5% perf-
> > profile.children.cycles-pp.nfs_create_request
> > 0.62 Â 4% -0.1 0.51 Â 5% perf-
> > profile.children.cycles-pp.__pagevec_lru_add_fn
> > 0.32 Â 4% -0.1 0.21 Â 11% perf-
> > profile.children.cycles-pp.xs_sock_recvmsg
> > 0.49 Â 12% -0.1 0.38 Â 7% perf-
> > profile.children.cycles-pp.sched_ttwu_pending
> > 0.82 Â 6% -0.1 0.72 Â 7% perf-
> > profile.children.cycles-pp.do_softirq
> > 0.32 Â 10% -0.1 0.22 Â 7% perf-
> > profile.children.cycles-pp.xfs_map_blocks
> > 0.40 Â 13% -0.1 0.30 Â 12% perf-
> > profile.children.cycles-pp.mutex_spin_on_owner
> > 0.47 Â 2% -0.1 0.38 Â 5% perf-
> > profile.children.cycles-pp.kmem_cache_alloc
> > 0.26 Â 4% -0.1 0.17 Â 8% perf-
> > profile.children.cycles-pp.__wake_up_common_lock
> > 0.52 Â 6% -0.1 0.43 Â 4% perf-
> > profile.children.cycles-pp.account_page_dirtied
> > 0.22 Â 16% -0.1 0.13 Â 9% perf-
> > profile.children.cycles-pp.__check_object_size
> > 0.50 Â 8% -0.1 0.42 Â 7% perf-
> > profile.children.cycles-pp.iomap_set_range_uptodate
> > 0.41 Â 3% -0.1 0.33 Â 14% perf-
> > profile.children.cycles-pp.__set_page_dirty
> > 0.31 Â 11% -0.1 0.24 Â 11% perf-
> > profile.children.cycles-pp.nfs_io_completion_release
> > 0.41 Â 8% -0.1 0.33 Â 7% perf-
> > profile.children.cycles-pp.nfs_inode_remove_request
> > 0.25 Â 9% -0.1 0.18 Â 8% perf-
> > profile.children.cycles-pp.xfs_iomap_write_allocate
> > 0.33 Â 8% -0.1 0.26 Â 4% perf-
> > profile.children.cycles-pp.___might_sleep
> > 0.25 Â 5% -0.1 0.18 Â 15% perf-
> > profile.children.cycles-pp.__xa_set_mark
> > 0.31 Â 9% -0.1 0.24 Â 5% perf-
> > profile.children.cycles-pp.nfs_scan_commit_list
> > 0.25 Â 5% -0.1 0.18 Â 8% perf-
> > profile.children.cycles-pp.__x86_indirect_thunk_rax
> > 0.45 Â 6% -0.1 0.39 Â 4% perf-
> > profile.children.cycles-pp.nfs_lock_and_join_requests
> > 0.50 Â 4% -0.1 0.45 Â 5% perf-
> > profile.children.cycles-pp.nfs_page_group_destroy
> > 0.19 Â 11% -0.1 0.14 Â 8% perf-
> > profile.children.cycles-pp.xfs_bmap_btalloc
> > 0.21 Â 2% -0.1 0.15 Â 14% perf-
> > profile.children.cycles-pp.__lock_sock
> > 0.17 Â 8% -0.1 0.12 Â 7% perf-
> > profile.children.cycles-pp.xfs_alloc_ag_vextent
> > 0.18 Â 9% -0.1 0.13 Â 6% perf-
> > profile.children.cycles-pp.xfs_alloc_vextent
> > 0.20 Â 12% -0.1 0.15 Â 5% perf-
> > profile.children.cycles-pp.xfs_bmapi_write
> > 0.30 Â 9% -0.0 0.25 Â 5% perf-
> > profile.children.cycles-pp.select_task_rq_fair
> > 0.16 Â 8% -0.0 0.11 Â 7% perf-
> > profile.children.cycles-pp.xfs_alloc_ag_vextent_near
> > 0.16 Â 15% -0.0 0.12 Â 17% perf-
> > profile.children.cycles-pp.__generic_write_end
> > 0.48 Â 2% -0.0 0.44 Â 7% perf-
> > profile.children.cycles-pp.dequeue_task_fair
> > 0.31 Â 2% -0.0 0.27 Â 10% perf-
> > profile.children.cycles-pp.release_pages
> > 0.20 Â 4% -0.0 0.17 Â 10% perf-
> > profile.children.cycles-pp.nfs_initiate_commit
> > 0.17 Â 10% -0.0 0.13 Â 5% perf-
> > profile.children.cycles-pp.__might_sleep
> > 0.29 Â 6% -0.0 0.26 Â 4% perf-
> > profile.children.cycles-pp.dec_zone_page_state
> > 0.23 Â 7% -0.0 0.20 Â 7% perf-
> > profile.children.cycles-pp.__pagevec_release
> > 0.19 Â 9% -0.0 0.15 Â 9% perf-
> > profile.children.cycles-pp.nfs_get_lock_context
> > 0.17 Â 13% -0.0 0.13 Â 8% perf-
> > profile.children.cycles-pp.check_preempt_curr
> > 0.09 Â 17% -0.0 0.06 Â 20% perf-
> > profile.children.cycles-pp.memset_erms
> > 0.14 Â 8% -0.0 0.11 Â 6% perf-
> > profile.children.cycles-pp.nfs_request_remove_commit_list
> > 0.07 Â 10% -0.0 0.04 Â 58% perf-
> > profile.children.cycles-pp.xfs_defer_finish_noroll
> > 0.15 Â 5% -0.0 0.12 Â 6% perf-
> > profile.children.cycles-pp.__nfs_find_lock_context
> > 0.19 Â 4% -0.0 0.16 Â 11% perf-
> > profile.children.cycles-pp.mem_cgroup_try_charge
> > 0.17 Â 8% -0.0 0.15 Â 10% perf-
> > profile.children.cycles-pp.vfs_create
> > 0.16 Â 2% -0.0 0.14 Â 8% perf-
> > profile.children.cycles-pp.__fprop_inc_percpu_max
> > 0.16 Â 8% -0.0 0.13 Â 8% perf-
> > profile.children.cycles-pp.xfs_create
> > 0.10 Â 4% -0.0 0.07 Â 17% perf-
> > profile.children.cycles-pp.xfs_file_aio_write_checks
> > 0.11 Â 10% -0.0 0.08 Â 15% perf-
> > profile.children.cycles-pp.nfs_pageio_doio
> > 0.11 Â 10% -0.0 0.08 Â 15% perf-
> > profile.children.cycles-pp.nfs_generic_pg_pgios
> > 0.19 Â 4% -0.0 0.17 Â 7% perf-
> > profile.children.cycles-pp.update_rq_clock
> > 0.15 Â 4% -0.0 0.13 Â 5% perf-
> > profile.children.cycles-pp.__xfs_trans_commit
> > 0.08 Â 15% -0.0 0.06 Â 6% perf-
> > profile.children.cycles-pp.get_mem_cgroup_from_mm
> > 0.06 Â 15% +0.0 0.07 Â 14% perf-
> > profile.children.cycles-pp.selinux_ip_postroute
> > 0.08 Â 16% +0.0 0.10 Â 10% perf-
> > profile.children.cycles-pp.nf_hook_slow
> > 0.04 Â 60% +0.0 0.08 Â 19% perf-
> > profile.children.cycles-pp.__inc_numa_state
> > 0.03 Â100% +0.0 0.07 Â 12% perf-
> > profile.children.cycles-pp.get_task_policy
> > 0.41 Â 6% +0.0 0.46 perf-
> > profile.children.cycles-pp.__release_sock
> > 0.14 Â 11% +0.1 0.19 Â 8% perf-
> > profile.children.cycles-pp.__list_add_valid
> > 0.09 Â 17% +0.1 0.14 Â 10% perf-
> > profile.children.cycles-pp.svc_xprt_do_enqueue
> > 0.14 Â 10% +0.1 0.20 Â 2% perf-
> > profile.children.cycles-pp.tcp_clean_rtx_queue
> > 0.00 +0.1 0.06 Â 11% perf-
> > profile.children.cycles-pp.free_unref_page_commit
> > 0.10 Â 11% +0.1 0.17 Â 6% perf-
> > profile.children.cycles-pp.iov_iter_advance
> > 0.17 Â 9% +0.1 0.24 Â 2% perf-
> > profile.children.cycles-pp.tcp_ack
> > 0.11 Â 21% +0.1 0.18 Â 2% perf-
> > profile.children.cycles-pp.schedule_timeout
> > 0.00 +0.1 0.08 Â 17% perf-
> > profile.children.cycles-pp.free_one_page
> > 0.23 Â 12% +0.1 0.32 Â 4% perf-
> > profile.children.cycles-pp.__kfree_skb
> > 0.10 Â 4% +0.1 0.19 Â 6% perf-
> > profile.children.cycles-pp.xas_create
> > 0.12 Â 17% +0.1 0.24 Â 4% perf-
> > profile.children.cycles-pp.skb_release_data
> > 0.00 +0.1 0.14 Â 5% perf-
> > profile.children.cycles-pp.__free_pages_ok
> > 0.00 +0.2 0.21 Â 9% perf-
> > profile.children.cycles-pp.skb_page_frag_refill
> > 0.00 +0.2 0.22 Â 9% perf-
> > profile.children.cycles-pp.sk_page_frag_refill
> > 0.00 +0.2 0.25 Â 5% perf-
> > profile.children.cycles-pp.__sk_flush_backlog
> > 0.03 Â100% +0.4 0.39 Â 7% perf-
> > profile.children.cycles-pp.free_pcppages_bulk
> > 0.07 Â 29% +0.5 0.54 Â 6% perf-
> > profile.children.cycles-pp.free_unref_page
> > 4.86 Â 6% +0.5 5.40 Â 2% perf-
> > profile.children.cycles-pp.svc_recv
> > 9.98 Â 4% +2.5 12.44 Â 4% perf-
> > profile.children.cycles-pp.worker_thread
> > 9.61 Â 4% +2.5 12.15 Â 4% perf-
> > profile.children.cycles-pp.process_one_work
> > 47.02 Â 3% +2.7 49.68 Â 4% perf-
> > profile.children.cycles-pp.secondary_startup_64
> > 47.02 Â 3% +2.7 49.68 Â 4% perf-
> > profile.children.cycles-pp.cpu_startup_entry
> > 47.02 Â 3% +2.7 49.69 Â 4% perf-
> > profile.children.cycles-pp.do_idle
> > 6.12 Â 2% +2.8 8.88 Â 3% perf-
> > profile.children.cycles-pp.memcpy_erms
> > 0.00 +3.6 3.58 Â 3% perf-
> > profile.children.cycles-pp.memcpy_from_page
> > 0.00 +3.7 3.68 Â 3% perf-
> > profile.children.cycles-pp._copy_from_iter_full
> > 2.43 Â 6% +3.8 6.20 Â 3% perf-
> > profile.children.cycles-pp.__rpc_execute
> > 2.29 Â 6% +3.8 6.05 Â 3% perf-
> > profile.children.cycles-pp.rpc_async_schedule
> > 1.93 Â 3% +3.8 5.71 Â 4% perf-
> > profile.children.cycles-pp.call_transmit
> > 1.92 Â 4% +3.8 5.71 Â 4% perf-
> > profile.children.cycles-pp.xprt_transmit
> > 1.87 Â 4% +3.8 5.67 Â 4% perf-
> > profile.children.cycles-pp.xs_tcp_send_request
> > 1.86 Â 4% +3.8 5.67 Â 4% perf-
> > profile.children.cycles-pp.xs_sendpages
> > 0.21 Â 7% +5.3 5.54 Â 4% perf-
> > profile.children.cycles-pp.tcp_sendmsg_locked
> > 0.24 Â 8% +5.4 5.63 Â 4% perf-
> > profile.children.cycles-pp.tcp_sendmsg
> > 0.25 Â 9% +5.4 5.64 Â 4% perf-
> > profile.children.cycles-pp.sock_sendmsg
> > 1.17 Â 7% -0.6 0.61 Â 21% perf-
> > profile.self.cycles-pp.rwsem_spin_on_owner
> > 2.50 Â 4% -0.5 2.05 Â 7% perf-
> > profile.self.cycles-pp.clear_page_erms
> > 2.46 Â 4% -0.4 2.06 Â 5% perf-
> > profile.self.cycles-pp.brd_do_bvec
> > 0.22 Â 10% -0.2 0.03 Â100% perf-
> > profile.self.cycles-pp.__skb_datagram_iter
> > 1.53 Â 6% -0.2 1.34 Â 3% perf-
> > profile.self.cycles-pp._raw_spin_lock
> > 0.91 Â 4% -0.2 0.74 Â 2% perf-
> > profile.self.cycles-pp.poll_idle
> > 0.24 Â 25% -0.2 0.08 Â 40% perf-
> > profile.self.cycles-pp.osq_lock
> > 0.74 Â 14% -0.2 0.59 Â 10% perf-
> > profile.self.cycles-pp.nfs_do_writepage
> > 0.32 Â 13% -0.1 0.17 Â 18% perf-
> > profile.self.cycles-pp.xas_start
> > 0.97 Â 3% -0.1 0.84 Â 8% perf-
> > profile.self.cycles-pp._raw_spin_lock_irqsave
> > 0.57 Â 6% -0.1 0.45 Â 9% perf-
> > profile.self.cycles-pp.xas_load
> > 0.55 Â 10% -0.1 0.43 Â 7% perf-
> > profile.self.cycles-pp.__schedule
> > 0.14 Â 12% -0.1 0.03 Â100% perf-
> > profile.self.cycles-pp._raw_spin_lock_bh
> > 0.53 Â 8% -0.1 0.43 Â 9% perf-
> > profile.self.cycles-pp.percpu_counter_add_batch
> > 0.40 Â 13% -0.1 0.30 Â 12% perf-
> > profile.self.cycles-pp.mutex_spin_on_owner
> > 0.33 Â 8% -0.1 0.24 Â 12% perf-
> > profile.self.cycles-pp.nfs_end_page_writeback
> > 0.74 Â 6% -0.1 0.65 Â 8% perf-
> > profile.self.cycles-pp.nfs_updatepage
> > 0.40 Â 8% -0.1 0.31 Â 11% perf-
> > profile.self.cycles-pp.__test_set_page_writeback
> > 0.50 Â 8% -0.1 0.41 Â 7% perf-
> > profile.self.cycles-pp.iomap_set_range_uptodate
> > 0.56 Â 5% -0.1 0.48 Â 8% perf-
> > profile.self.cycles-pp.try_to_wake_up
> > 0.38 Â 8% -0.1 0.30 Â 5% perf-
> > profile.self.cycles-pp._raw_spin_unlock_irqrestore
> > 0.28 Â 9% -0.1 0.20 Â 6% perf-
> > profile.self.cycles-pp.iomap_write_end
> > 0.45 Â 7% -0.1 0.38 Â 6% perf-
> > profile.self.cycles-pp.__pagevec_lru_add_fn
> > 0.15 Â 14% -0.1 0.08 Â 27% perf-
> > profile.self.cycles-pp.add_to_page_cache_lru
> > 0.43 Â 6% -0.1 0.36 perf-
> > profile.self.cycles-pp.clear_page_dirty_for_io
> > 0.51 Â 8% -0.1 0.45 Â 7% perf-
> > profile.self.cycles-pp.test_clear_page_writeback
> > 0.20 Â 9% -0.1 0.14 Â 11% perf-
> > profile.self.cycles-pp.xas_store
> > 0.31 Â 8% -0.1 0.25 Â 4% perf-
> > profile.self.cycles-pp.___might_sleep
> > 0.15 Â 20% -0.1 0.09 Â 12% perf-
> > profile.self.cycles-pp.__check_object_size
> > 0.22 Â 6% -0.1 0.17 Â 10% perf-
> > profile.self.cycles-pp.__x86_indirect_thunk_rax
> > 0.08 Â 11% -0.0 0.03 Â100% perf-
> > profile.self.cycles-pp.switch_mm_irqs_off
> > 0.28 Â 4% -0.0 0.24 Â 10% perf-
> > profile.self.cycles-pp.release_pages
> > 0.27 Â 8% -0.0 0.23 Â 2% perf-
> > profile.self.cycles-pp.dec_zone_page_state
> > 0.19 Â 6% -0.0 0.15 Â 12% perf-
> > profile.self.cycles-pp.kmem_cache_alloc
> > 0.15 Â 10% -0.0 0.11 Â 7% perf-
> > profile.self.cycles-pp.__might_sleep
> > 0.25 Â 10% -0.0 0.21 Â 6% perf-
> > profile.self.cycles-pp.wait_on_page_bit_common
> > 0.07 Â 12% -0.0 0.03 Â100% perf-
> > profile.self.cycles-pp.___slab_alloc
> > 0.23 Â 7% -0.0 0.20 Â 10% perf-
> > profile.self.cycles-pp._raw_spin_lock_irq
> > 0.18 Â 5% -0.0 0.15 Â 2% perf-
> > profile.self.cycles-pp.select_task_rq_fair
> > 0.14 Â 5% -0.0 0.11 Â 17% perf-
> > profile.self.cycles-pp.alloc_pages_current
> > 0.09 Â 16% -0.0 0.06 Â 20% perf-
> > profile.self.cycles-pp.memset_erms
> > 0.10 Â 8% -0.0 0.07 perf-
> > profile.self.cycles-pp._cond_resched
> > 0.15 Â 4% -0.0 0.12 Â 6% perf-
> > profile.self.cycles-pp.__nfs_find_lock_context
> > 0.11 Â 13% -0.0 0.09 Â 14% perf-
> > profile.self.cycles-pp.nfs_request_add_commit_list_locked
> > 0.09 Â 4% -0.0 0.07 perf-
> > profile.self.cycles-pp.nfs_request_remove_commit_list
> > 0.08 Â 10% -0.0 0.06 Â 11% perf-
> > profile.self.cycles-pp.get_mem_cgroup_from_mm
> > 0.08 Â 10% -0.0 0.06 Â 11% perf-
> > profile.self.cycles-pp.nfs_request_add_commit_list
> > 0.10 Â 5% -0.0 0.08 Â 5% perf-
> > profile.self.cycles-pp.___perf_sw_event
> > 0.08 Â 13% +0.0 0.11 Â 10% perf-
> > profile.self.cycles-pp.mem_cgroup_commit_charge
> > 0.03 Â100% +0.0 0.07 Â 12% perf-
> > profile.self.cycles-pp.run_timer_softirq
> > 0.03 Â100% +0.0 0.07 Â 12% perf-
> > profile.self.cycles-pp.get_task_policy
> > 0.03 Â100% +0.0 0.07 Â 17% perf-
> > profile.self.cycles-pp.__inc_numa_state
> > 0.03 Â102% +0.1 0.08 Â 8% perf-
> > profile.self.cycles-pp.svc_recv
> > 0.11 Â 13% +0.1 0.17 Â 12% perf-
> > profile.self.cycles-pp.__list_add_valid
> > 0.09 Â 14% +0.1 0.16 Â 9% perf-
> > profile.self.cycles-pp.iov_iter_advance
> > 0.06 Â 13% +0.1 0.15 Â 8% perf-
> > profile.self.cycles-pp.xas_create
> > 0.00 +0.1 0.15 Â 14% perf-
> > profile.self.cycles-pp.tcp_sendmsg_locked
> > 0.00 +0.2 0.20 Â 11% perf-
> > profile.self.cycles-pp.free_pcppages_bulk
> > 0.04 Â 60% +0.2 0.28 Â 7% perf-
> > profile.self.cycles-pp.svc_tcp_recvfrom
> > 1.25 +0.3 1.52 Â 5% perf-
> > profile.self.cycles-pp.get_page_from_freelist
> > 6.04 Â 2% +2.7 8.74 Â 3% perf-
> > profile.self.cycles-pp.memcpy_erms
> >
> >
> >
> >
> > fsmark.time.percent_of_cpu_this_job_got
> >
> >
> > 320 +-+---------------------------------------------------------
> > ----------+
> > |
> > |
> > 300 +-
> > ++. +
> > |
> > |. +.. .+. +.. +.. +.
> > .. + .+.|
> > 280 +-+ +.+. +..
> > : + .+..+. .+. .+. .+.+.. .. + +. |
> > | : + + +. +. +. .+
> > |
> > 260 +-
> > + + +
> > |
> > |
> > |
> > 240 +-
> > +
> > |
> > |
> > |
> > 220 +-
> > +
> > |
> > | O O O O O
> > O O |
> > 200 O-+ O O O O O O
> > O O |
> > | O O O O O O O O
> > O O |
> > 180 +-+---------------------------------------------------------
> > ----------+
> >
> >
> >
> >
> >
> > fsmark.time.elapsed_time
> >
> >
> > 32 +-+O---------------------------------------------------------
> > ----------+
> > O O O O O O O O O O O
> > O |
> > 31 +-+ O O O O O O O O O
> > O O O |
> > 30 +-
> > + O
> > |
> > |
> > |
> > 29 +-
> > +
> > |
> > 28 +-
> > +
> > |
> > |
> > |
> > 27 +-
> > +
> > |
> > 26 +-
> > + .+
> > |
> > | + + +.+..
> > .+. + .+. .+. |
> > 25 +-+ .. + .+ + : +..+.+..+.
> > .. + +. +. +..|
> > 24 +-+ .+.+..+.+ +. + + :
> > .. + |
> > | +.+. + +
> > |
> > 23 +-+--------------------------------------------------------
> > ------------+
> >
> >
> >
> >
> >
> > fsmark.time.elapsed_time.max
> >
> >
> > 32 +-+O---------------------------------------------------------
> > ----------+
> > O O O O O O O O O O O
> > O |
> > 31 +-+ O O O O O O O O O
> > O O O |
> > 30 +-
> > + O
> > |
> > |
> > |
> > 29 +-
> > +
> > |
> > 28 +-
> > +
> > |
> > |
> > |
> > 27 +-
> > +
> > |
> > 26 +-
> > + .+
> > |
> > | + + +.+..
> > .+. + .+. .+. |
> > 25 +-+ .. + .+ + : +..+.+..+.
> > .. + +. +. +..|
> > 24 +-+ .+.+..+.+ +. + + :
> > .. + |
> > | +.+. + +
> > |
> > 23 +-+--------------------------------------------------------
> > ------------+
> >
> >
> >
> >
> >
> > fsmark.files_per_sec
> >
> >
> > 520 +-+---------------------------------------------------------
> > ----------+
> > |.. +.+. + +.. .+..
> > : .+. .+.. .+.+..+ |
> > 500 +-+ +..
> > + + : +.+..+.+. +..+ .+.+. |
> > 480 +-
> > + + + +.+.
> > |
> > |
> > |
> > 460 +-
> > +
> > |
> > |
> > |
> > 440 +-
> > +
> > |
> > |
> > |
> > 420 +-
> > +
> > |
> > 400 +-
> > + O
> > |
> > | O O O O O O O
> > O O |
> > 380 O-+ O O O O O O O
> > O O |
> > | O O O O O
> > O |
> > 360 +-+---------------------------------------------------------
> > ----------+
> >
> >
> >
> >
> > [*] bisect-good sample
> > [O] bisect-bad sample
> >
> >
> >
> > Disclaimer:
> > Results have been estimated based on internal Intel analysis and
> > are provided
> > for informational purposes only. Any difference in system hardware
> > or software
> > design or configuration may affect actual performance.
> >
> >
> > Thanks,
> > Rong Chen
> >
> >
> > _______________________________________________
> > LKP mailing list
> > LKP@xxxxxxxxxxxx
> > https://lists.01.org/mailman/listinfo/lkp
> >
>
> Do you have time to take a look at this regression?
From your stats, it looks to me as if the problem is increased NUMA
overhead. Pretty much everything else appears to be the same or
actually performing better than previously. Am I interpreting that
correctly?
If my interpretation above is correct, then I'm not seeing where this
patch would be introducing new NUMA regressions. It is just converting
from using one method of doing socket I/O to another. Could it perhaps
be a memory artefact due to your running the NFS client and server on
the same machine?
Apologies for pushing back a little, but I just don't have the
hardware available to test NUMA configurations, so I'm relying on
external testing for the above kind of scenario.
Thanks
Trond
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx