Re: [LKP] [lkp] [mm] 795ae7a0de: pixz.throughput -9.1% regression

From: Johannes Weiner
Date: Wed Jun 22 2016 - 17:27:03 EST


Hi,

On Wed, Jun 08, 2016 at 01:37:26PM +0800, Ye Xiaolong wrote:
> On Tue, Jun 07, 2016 at 05:56:27PM -0400, Johannes Weiner wrote:
> >But just to make sure I'm looking at the right code, can you first try
> >the following patch on top of Linus's current tree and see if that
> >gets performance back to normal? It's a partial revert of the
> >watermarks that singles out the fair zone allocator:
>
> Seems that this patch doesn't help to gets performance back.
> I've attached the comparison result among 3ed3a4f, 795ae7ay, v4.7-rc2 and
> 1fe49ba5 ("mm: revert fairness batching to before the watermarks were")
> with perf profile information. You can find it via searching 'perf-profile'.

Sorry for the delay, and thank you for running these. I still can't
reproduce this.

> 3ed3a4f0ddffece9 795ae7a0de6b834a0cc202aa55 v4.7-rc2 1fe49ba5002a50aefd5b6c4913
> ---------------- -------------------------- -------------------------- --------------------------
> fail:runs %reproduction fail:runs %reproduction fail:runs %reproduction fail:runs
> | | | | | | |
> :4 0% :7 0% :4 50% 2:4 kmsg.DHCP/BOOTP:Reply_not_for_us,op[#]xid[#]
> :4 50% 2:7 0% :4 0% :4 kmsg.Spurious_LAPIC_timer_interrupt_on_cpu
> :4 0% :7 14% 1:4 25% 1:4 kmsg.igb#:#:#:exceed_max#second
> %stddev %change %stddev %change %stddev %change %stddev
> \ | \ | \ | \
> 78505362 ± 0% -9.2% 71298182 ± 0% -11.8% 69280014 ± 0% -9.1% 71350485 ± 0% pixz.throughput
> 5586220 ± 2% -1.6% 5498492 ± 2% +6.5% 5950210 ± 1% +8.4% 6052963 ± 1% pixz.time.involuntary_context_switches
> 4582198 ± 2% -3.6% 4416275 ± 2% -8.6% 4189304 ± 4% -8.0% 4214839 ± 0% pixz.time.minor_page_faults
> 4530 ± 0% +1.0% 4575 ± 0% -1.6% 4458 ± 0% -1.3% 4469 ± 0% pixz.time.percent_of_cpu_this_job_got
> 92.03 ± 0% +5.6% 97.23 ± 11% +31.3% 120.83 ± 1% +30.4% 119.98 ± 0% pixz.time.system_time
> 14911 ± 0% +2.1% 15218 ± 0% -1.0% 14759 ± 1% -1.0% 14764 ± 0% pixz.time.user_time
> 6586930 ± 0% -8.4% 6033444 ± 1% -4.4% 6295529 ± 1% -2.6% 6416460 ± 1% pixz.time.voluntary_context_switches
> 2179703 ± 4% +4.8% 2285049 ± 2% -15.3% 1846752 ± 16% -8.2% 2000913 ± 4% softirqs.RCU
> 92.03 ± 0% +5.6% 97.23 ± 11% +31.3% 120.83 ± 1% +30.4% 119.98 ± 0% time.system_time
> 2237 ± 2% -2.9% 2172 ± 7% +16.3% 2601 ± 7% +8.0% 2416 ± 6% uptime.idle
> 49869 ± 1% -12.6% 43583 ± 8% -18.0% 40917 ± 0% -16.3% 41728 ± 1% vmstat.system.cs
> 97890 ± 1% -0.0% 97848 ± 3% +7.4% 105143 ± 2% +6.8% 104518 ± 2% vmstat.system.in
> 105682 ± 1% +0.6% 106297 ± 1% -85.2% 15631 ± 4% -85.1% 15768 ± 1% meminfo.Active(file)
> 390126 ± 0% -0.2% 389529 ± 0% +23.9% 483296 ± 0% +23.9% 483194 ± 0% meminfo.Inactive
> 380750 ± 0% -0.2% 380141 ± 0% +24.5% 473891 ± 0% +24.4% 473760 ± 0% meminfo.Inactive(file)
> 2401 ±107% +76.9% 4247 ± 79% -99.8% 5.75 ± 18% -99.7% 6.75 ± 39% numa-numastat.node0.other_node
> 2074670 ± 2% -11.3% 1840052 ± 11% -21.1% 1637071 ± 12% -22.5% 1607724 ± 7% numa-numastat.node1.local_node
> 2081648 ± 2% -11.4% 1844923 ± 11% -21.4% 1637081 ± 12% -22.8% 1607730 ± 7% numa-numastat.node1.numa_hit
> 6977 ± 36% -30.2% 4871 ± 66% -99.8% 10.50 ± 17% -99.9% 5.50 ± 20% numa-numastat.node1.other_node
> 13061458 ± 19% -3.3% 12634644 ± 24% +33.5% 17435714 ± 47% +58.3% 20674526 ± 14% cpuidle.C1-IVT.time
> 193807 ± 15% +26.8% 245657 ± 76% +101.8% 391021 ± 8% +115.5% 417669 ± 20% cpuidle.C1-IVT.usage
> 8.866e+08 ± 2% -15.6% 7.479e+08 ± 6% +25.0% 1.108e+09 ± 5% +21.0% 1.073e+09 ± 4% cpuidle.C6-IVT.time
> 93283 ± 0% -13.2% 80988 ± 3% +300.6% 373726 ±121% +20.8% 112719 ± 1% cpuidle.C6-IVT.usage
> 8559466 ± 20% -39.3% 5195127 ± 40% -98.1% 159481 ±173% -100.0% 97.50 ± 40% cpuidle.POLL.time
> 771388 ± 9% -53.4% 359081 ± 52% -99.9% 959.00 ±167% -100.0% 40.50 ± 39% cpuidle.POLL.usage
> 94.35 ± 0% +1.0% 95.28 ± 0% -1.6% 92.81 ± 0% -1.4% 93.00 ± 0% turbostat.%Busy
> 2824 ± 0% +1.0% 2851 ± 0% -1.6% 2777 ± 0% -1.4% 2784 ± 0% turbostat.Avg_MHz
> 3.57 ± 3% -20.9% 2.83 ± 6% +18.6% 4.24 ± 6% +9.4% 3.91 ± 4% turbostat.CPU%c1
> 2.07 ± 3% -8.8% 1.89 ± 10% +42.0% 2.95 ± 13% +48.7% 3.08 ± 4% turbostat.CPU%c6
> 157.67 ± 0% -0.7% 156.51 ± 0% -1.4% 155.47 ± 0% -1.4% 155.39 ± 0% turbostat.CorWatt
> 0.17 ± 17% -2.9% 0.17 ± 23% +151.4% 0.44 ± 23% +88.6% 0.33 ± 11% turbostat.Pkg%pc2
> 192.71 ± 0% -0.8% 191.15 ± 0% -1.4% 190.10 ± 0% -1.3% 190.12 ± 0% turbostat.PkgWatt
> 22.36 ± 0% -8.4% 20.49 ± 0% -10.3% 20.05 ± 0% -8.1% 20.55 ± 0% turbostat.RAMWatt
> 53301 ± 2% +0.3% 53439 ± 5% -85.3% 7826 ± 4% -85.2% 7898 ± 1% numa-meminfo.node0.Active(file)
> 194536 ± 2% +0.8% 196145 ± 2% +24.4% 241970 ± 1% +25.2% 243537 ± 1% numa-meminfo.node0.Inactive
> 189951 ± 0% -0.1% 189801 ± 1% +24.7% 236921 ± 0% +24.7% 236864 ± 0% numa-meminfo.node0.Inactive(file)
> 10240 ± 2% -1.0% 10138 ± 3% -16.2% 8580 ± 3% -17.6% 8442 ± 1% numa-meminfo.node0.KernelStack
> 26406 ± 4% -8.4% 24183 ± 7% -10.2% 23723 ± 2% -4.7% 25152 ± 5% numa-meminfo.node0.SReclaimable
> 52381 ± 1% +0.9% 52856 ± 3% -85.1% 7804 ± 4% -85.0% 7867 ± 2% numa-meminfo.node1.Active(file)
> 195602 ± 2% -1.1% 193393 ± 2% +23.4% 241343 ± 1% +22.5% 239683 ± 1% numa-meminfo.node1.Inactive
> 190797 ± 0% -0.2% 190340 ± 1% +24.2% 236969 ± 0% +24.2% 236897 ± 0% numa-meminfo.node1.Inactive(file)
> 4188 ± 6% +2.4% 4289 ± 5% +42.2% 5955 ± 4% +45.0% 6073 ± 2% numa-meminfo.node1.KernelStack
> 22906 ± 4% +10.5% 25314 ± 6% +13.4% 25980 ± 2% +8.5% 24850 ± 5% numa-meminfo.node1.SReclaimable
> 13324 ± 2% +0.3% 13359 ± 5% -85.3% 1956 ± 4% -85.2% 1974 ± 1% numa-vmstat.node0.nr_active_file
> 454.25 ± 2% +773.4% 3967 ± 2% +794.3% 4062 ± 3% +194.4% 1337 ± 2% numa-vmstat.node0.nr_alloc_batch
> 47488 ± 0% -0.1% 47449 ± 1% +24.7% 59229 ± 0% +24.7% 59215 ± 0% numa-vmstat.node0.nr_inactive_file
> 639.25 ± 2% -1.0% 633.00 ± 3% -16.2% 536.00 ± 3% -17.5% 527.25 ± 1% numa-vmstat.node0.nr_kernel_stack
> 6600 ± 4% -8.4% 6045 ± 7% -10.2% 5930 ± 2% -4.7% 6287 ± 5% numa-vmstat.node0.nr_slab_reclaimable
> 69675 ± 3% +3.0% 71759 ± 4% -100.0% 2.50 ± 66% -100.0% 3.75 ± 51% numa-vmstat.node0.numa_other
> 13094 ± 1% +0.9% 13213 ± 3% -85.1% 1950 ± 4% -85.0% 1966 ± 2% numa-vmstat.node1.nr_active_file
> 563.00 ± 2% +642.6% 4181 ± 3% +631.5% 4118 ± 4% +162.8% 1479 ± 2% numa-vmstat.node1.nr_alloc_batch
> 47699 ± 0% -0.2% 47584 ± 1% +24.2% 59241 ± 0% +24.2% 59223 ± 0% numa-vmstat.node1.nr_inactive_file
> 261.25 ± 6% +2.4% 267.57 ± 5% +42.3% 371.75 ± 4% +45.3% 379.50 ± 2% numa-vmstat.node1.nr_kernel_stack
> 5726 ± 4% +10.5% 6328 ± 6% +13.4% 6495 ± 2% +8.5% 6212 ± 5% numa-vmstat.node1.nr_slab_reclaimable
> 1254802 ± 3% -9.6% 1134298 ± 10% -19.6% 1008654 ± 9% -21.0% 990900 ± 5% numa-vmstat.node1.numa_hit
> 1232554 ± 3% -9.6% 1113884 ± 10% -18.2% 1008648 ± 9% -19.6% 990898 ± 5% numa-vmstat.node1.numa_local
> 22247 ± 11% -8.2% 20414 ± 16% -100.0% 5.75 ± 18% -100.0% 1.75 ± 24% numa-vmstat.node1.numa_other
> 26419 ± 1% +0.6% 26573 ± 1% -85.2% 3907 ± 4% -85.1% 3941 ± 1% proc-vmstat.nr_active_file
> 946.75 ± 3% +764.9% 8188 ± 2% +745.5% 8004 ± 1% +196.1% 2803 ± 1% proc-vmstat.nr_alloc_batch
> 95188 ± 0% -0.2% 95035 ± 0% +24.5% 118472 ± 0% +24.4% 118440 ± 0% proc-vmstat.nr_inactive_file
> 3005733 ± 3% -4.4% 2872963 ± 2% -14.6% 2566600 ± 6% -13.9% 2587727 ± 1% proc-vmstat.numa_hint_faults_local
> 3652636 ± 1% -4.4% 3492233 ± 2% -16.5% 3049926 ± 2% -14.0% 3139498 ± 0% proc-vmstat.numa_hit
> 3643257 ± 1% -4.4% 3483323 ± 2% -16.3% 3049910 ± 2% -13.8% 3139486 ± 0% proc-vmstat.numa_local
> 9379 ± 0% -5.0% 8909 ± 12% -99.8% 16.25 ± 7% -99.9% 12.25 ± 27% proc-vmstat.numa_other
> 4924994 ± 3% +0.9% 4966927 ± 9% +38.2% 6804572 ± 5% +38.7% 6831202 ± 4% proc-vmstat.numa_pages_migrated
> 8510 ± 0% +1.5% 8638 ± 1% -27.1% 6204 ± 31% -11.2% 7554 ± 1% proc-vmstat.pgactivate
> 2403080 ± 2% -58.7% 993450 ± 2% -57.0% 1033978 ± 4% -39.3% 1457730 ± 3% proc-vmstat.pgalloc_dma32
> 15038432 ± 0% +8.1% 16250009 ± 3% +16.9% 17583879 ± 2% +14.9% 17277548 ± 1% proc-vmstat.pgalloc_normal
> 32128 ± 22% +41.4% 45421 ± 21% +391.6% 157952 ± 9% +333.9% 139392 ± 11% proc-vmstat.pgmigrate_fail
> 4924994 ± 3% +0.9% 4966927 ± 9% +38.2% 6804572 ± 5% +38.7% 6831202 ± 4% proc-vmstat.pgmigrate_success
> 25886 ± 2% -1.2% 25585 ± 4% +12.0% 28981 ± 2% +12.5% 29132 ± 2% proc-vmstat.thp_deferred_split_page
> 632.75 ± 3% -1.6% 622.43 ± 3% -18.1% 518.50 ± 2% -19.4% 510.00 ± 0% slabinfo.RAW.active_objs
> 632.75 ± 3% -1.6% 622.43 ± 3% -18.1% 518.50 ± 2% -19.4% 510.00 ± 0% slabinfo.RAW.num_objs
> 1512 ± 1% -0.6% 1502 ± 1% -100.0% 0.00 ± -1% -100.0% 0.00 ± -1% slabinfo.UNIX.active_objs
> 1512 ± 1% -0.6% 1502 ± 1% -100.0% 0.00 ± -1% -100.0% 0.00 ± -1% slabinfo.UNIX.num_objs
> 766.50 ± 10% +7.5% 823.86 ± 10% -100.0% 0.00 ± -1% -100.0% 0.00 ± -1% slabinfo.avc_xperms_node.active_objs
> 766.50 ± 10% +7.5% 823.86 ± 10% -100.0% 0.00 ± -1% -100.0% 0.00 ± -1% slabinfo.avc_xperms_node.num_objs
> 507.00 ± 9% +16.1% 588.57 ± 10% +21.1% 614.00 ± 4% +3.1% 522.75 ± 10% slabinfo.file_lock_cache.active_objs
> 507.00 ± 9% +16.1% 588.57 ± 10% +21.1% 614.00 ± 4% +3.1% 522.75 ± 10% slabinfo.file_lock_cache.num_objs
> 13334 ± 4% +6.9% 14255 ± 4% +12.1% 14952 ± 4% +11.8% 14907 ± 11% slabinfo.kmalloc-512.num_objs
> 357.00 ± 2% +0.7% 359.43 ± 1% +35.2% 482.75 ± 0% +33.9% 478.00 ± 0% slabinfo.kmalloc-8192.num_objs
> 8080 ± 3% +1.9% 8233 ± 4% +16.8% 9441 ± 5% +17.2% 9470 ± 1% slabinfo.kmalloc-96.active_objs
> 8125 ± 3% +1.9% 8281 ± 4% +16.8% 9488 ± 5% +17.1% 9511 ± 1% slabinfo.kmalloc-96.num_objs
> 1112 ± 4% -2.5% 1084 ± 5% +6.6% 1186 ± 11% +11.9% 1244 ± 2% slabinfo.task_group.active_objs
> 1112 ± 4% -2.5% 1084 ± 5% +6.6% 1186 ± 11% +11.9% 1244 ± 2% slabinfo.task_group.num_objs
> 18.81 ± 7% +12.4% 21.13 ± 28% +4.5e+06% 837325 ± 3% +4.5e+06% 846688 ± 0% sched_debug.cfs_rq:/.load.avg
> 90.42 ± 75% +88.7% 170.62 ±137% +1.1e+06% 1028138 ± 0% +1.3e+06% 1157227 ± 19% sched_debug.cfs_rq:/.load.max
> 10.83 ± 25% +13.6% 12.31 ± 12% +4.6e+06% 500135 ± 62% +5.8e+06% 625582 ± 11% sched_debug.cfs_rq:/.load.min
> 12.00 ± 81% +96.8% 23.63 ±144% +9.6e+05% 115762 ± 29% +8e+05% 96269 ± 29% sched_debug.cfs_rq:/.load.stddev
> 26.71 ± 11% -8.9% 24.33 ± 10% +2902.0% 801.76 ± 2% +2935.3% 810.66 ± 0% sched_debug.cfs_rq:/.load_avg.avg
> 241.42 ± 34% -32.4% 163.29 ± 54% +294.1% 951.38 ± 2% +299.9% 965.33 ± 3% sched_debug.cfs_rq:/.load_avg.max
> 14.13 ± 5% +4.8% 14.81 ± 4% +3872.3% 561.08 ± 18% +4326.5% 625.25 ± 5% sched_debug.cfs_rq:/.load_avg.min
> 37.52 ± 37% -35.7% 24.15 ± 48% +103.7% 76.43 ± 19% +49.9% 56.27 ± 7% sched_debug.cfs_rq:/.load_avg.stddev
> 6864771 ± 0% +1.6% 6971358 ± 0% -97.9% 146805 ± 0% -97.9% 147296 ± 0% sched_debug.cfs_rq:/.min_vruntime.avg
> 6984488 ± 0% +1.2% 7071775 ± 0% -97.7% 158812 ± 0% -97.7% 160483 ± 1% sched_debug.cfs_rq:/.min_vruntime.max
> 6522931 ± 1% +1.2% 6598038 ± 1% -97.8% 141019 ± 1% -97.8% 141943 ± 0% sched_debug.cfs_rq:/.min_vruntime.min
> 80297 ± 7% -5.5% 75882 ± 12% -95.3% 3775 ± 7% -95.4% 3703 ± 9% sched_debug.cfs_rq:/.min_vruntime.stddev
> 16.76 ± 1% +1.4% 16.98 ± 0% +4570.6% 782.68 ± 2% +4662.4% 798.07 ± 0% sched_debug.cfs_rq:/.runnable_load_avg.avg
> 28.88 ± 7% +10.6% 31.93 ± 7% +3065.5% 914.04 ± 3% +3138.1% 935.00 ± 1% sched_debug.cfs_rq:/.runnable_load_avg.max
> 9.54 ± 28% +18.5% 11.31 ± 11% +4105.7% 401.29 ± 62% +5300.0% 515.25 ± 13% sched_debug.cfs_rq:/.runnable_load_avg.min
> 2.92 ± 11% +2.0% 2.98 ± 10% +3057.4% 92.11 ± 45% +2153.7% 65.75 ± 13% sched_debug.cfs_rq:/.runnable_load_avg.stddev
> 83900 ± 25% -46.8% 44629 ± 65% -98.9% 894.85 ±155% -99.8% 201.22 ±111% sched_debug.cfs_rq:/.spread0.max
> -377675 ±-21% +13.7% -429229 ±-20% -95.5% -16912 ± -5% -95.1% -18353 ±-11% sched_debug.cfs_rq:/.spread0.min
> 80284 ± 7% -5.5% 75895 ± 12% -95.3% 3778 ± 7% -95.4% 3707 ± 9% sched_debug.cfs_rq:/.spread0.stddev
> 81.92 ± 23% -30.5% 56.96 ± 12% -6.6% 76.55 ± 29% -28.3% 58.74 ± 9% sched_debug.cfs_rq:/.util_avg.stddev
> 249892 ± 16% -2.1% 244699 ± 33% +94.1% 485129 ± 13% +114.3% 535496 ± 22% sched_debug.cpu.avg_idle.min
> 149745 ± 9% +13.0% 169186 ± 7% -28.0% 107794 ± 16% -8.4% 137183 ± 70% sched_debug.cpu.avg_idle.stddev
> 2.94 ± 10% +21.0% 3.56 ± 33% +107.3% 6.10 ± 7% +84.9% 5.44 ± 11% sched_debug.cpu.clock.stddev
> 2.94 ± 10% +21.0% 3.56 ± 33% +107.3% 6.10 ± 7% +84.9% 5.44 ± 11% sched_debug.cpu.clock_task.stddev
> 17.64 ± 9% -0.9% 17.48 ± 7% +4333.8% 781.92 ± 2% +4425.8% 798.14 ± 0% sched_debug.cpu.cpu_load[0].avg
> 69.46 ±103% -20.3% 55.38 ±102% +1216.0% 914.04 ± 3% +1246.3% 935.08 ± 1% sched_debug.cpu.cpu_load[0].max
> 11.08 ± 24% +25.0% 13.86 ± 12% +3294.7% 376.25 ± 67% +4554.1% 515.83 ± 12% sched_debug.cpu.cpu_load[0].min
> 8.49 ±115% -28.6% 6.06 ±132% +1028.8% 95.82 ± 43% +674.4% 65.73 ± 13% sched_debug.cpu.cpu_load[0].stddev
> 17.31 ± 5% -0.1% 17.29 ± 3% +4472.2% 791.32 ± 2% +4547.7% 804.39 ± 0% sched_debug.cpu.cpu_load[1].avg
> 48.17 ± 72% -8.0% 44.33 ± 60% +1832.6% 930.88 ± 2% +1837.6% 933.29 ± 1% sched_debug.cpu.cpu_load[1].max
> 12.04 ± 16% +15.5% 13.90 ± 11% +4315.6% 531.71 ± 19% +5030.8% 617.83 ± 4% sched_debug.cpu.cpu_load[1].min
> 5.37 ± 86% -16.0% 4.51 ± 82% +1297.0% 75.04 ± 37% +890.6% 53.21 ± 14% sched_debug.cpu.cpu_load[1].stddev
> 17.22 ± 3% -0.2% 17.19 ± 1% +4482.9% 788.99 ± 2% +4559.4% 802.18 ± 0% sched_debug.cpu.cpu_load[2].avg
> 40.29 ± 36% -4.6% 38.43 ± 32% +2179.5% 918.46 ± 1% +2210.0% 930.75 ± 1% sched_debug.cpu.cpu_load[2].max
> 12.25 ± 16% +13.1% 13.86 ± 10% +4163.6% 522.29 ± 21% +4879.3% 609.96 ± 5% sched_debug.cpu.cpu_load[2].min
> 4.29 ± 45% -13.7% 3.70 ± 44% +1627.3% 74.02 ± 36% +1125.0% 52.50 ± 13% sched_debug.cpu.cpu_load[2].stddev
> 17.16 ± 2% -0.2% 17.13 ± 1% +4483.1% 786.38 ± 2% +4563.1% 800.09 ± 0% sched_debug.cpu.cpu_load[3].avg
> 36.12 ± 14% -3.7% 34.79 ± 15% +2413.4% 907.96 ± 2% +2461.4% 925.29 ± 1% sched_debug.cpu.cpu_load[3].max
> 12.38 ± 15% +12.7% 13.95 ± 9% +3985.2% 505.54 ± 25% +4706.1% 594.75 ± 4% sched_debug.cpu.cpu_load[3].min
> 3.72 ± 21% -14.2% 3.19 ± 22% +1887.6% 73.94 ± 37% +1343.2% 53.69 ± 10% sched_debug.cpu.cpu_load[3].stddev
> 17.12 ± 1% -0.1% 17.10 ± 1% +4478.8% 783.84 ± 2% +4561.4% 797.98 ± 0% sched_debug.cpu.cpu_load[4].avg
> 33.42 ± 2% -2.6% 32.55 ± 7% +2573.3% 893.33 ± 3% +2633.2% 913.33 ± 1% sched_debug.cpu.cpu_load[4].max
> 12.88 ± 12% +10.4% 14.21 ± 7% +3701.3% 489.42 ± 26% +4365.7% 574.96 ± 6% sched_debug.cpu.cpu_load[4].min
> 3.30 ± 5% -12.2% 2.89 ± 14% +2147.0% 74.10 ± 38% +1571.6% 55.12 ± 9% sched_debug.cpu.cpu_load[4].stddev
> 1722 ± 32% +14.8% 1977 ± 39% +50.3% 2588 ± 58% +49.2% 2570 ± 17% sched_debug.cpu.curr->pid.min
> 20.57 ± 7% +5.1% 21.62 ± 27% +4.1e+06% 835527 ± 2% +4.1e+06% 847625 ± 0% sched_debug.cpu.load.avg
> 169.12 ± 41% +15.5% 195.38 ±117% +6.1e+05% 1027133 ± 0% +7.1e+05% 1200175 ± 14% sched_debug.cpu.load.max
> 10.88 ± 25% +13.2% 12.31 ± 12% +4.6e+06% 500134 ± 62% +5.8e+06% 625582 ± 11% sched_debug.cpu.load.min
> 23.05 ± 44% +17.7% 27.14 ±122% +4.9e+05% 113672 ± 31% +4.4e+05% 102223 ± 24% sched_debug.cpu.load.stddev
> 0.00 ± 2% +2.4% 0.00 ± 1% +31.6% 0.00 ± 14% +20.6% 0.00 ± 17% sched_debug.cpu.next_balance.stddev
> 1623 ± 9% +4.1% 1689 ± 8% +74.5% 2831 ± 3% +73.9% 2823 ± 7% sched_debug.cpu.nr_load_updates.stddev
> 159639 ± 1% -11.5% 141259 ± 8% -17.0% 132534 ± 1% -15.7% 134652 ± 1% sched_debug.cpu.nr_switches.avg
> 11.79 ± 15% +9.0% 12.86 ± 25% +268.6% 43.46 ± 18% +273.9% 44.08 ± 11% sched_debug.cpu.nr_uninterruptible.max
> -16.00 ±-13% -5.2% -15.17 ±-22% +337.5% -70.00 ±-10% +336.7% -69.88 ±-26% sched_debug.cpu.nr_uninterruptible.min
> 5.10 ± 9% -1.0% 5.05 ± 8% +414.2% 26.25 ± 16% +399.5% 25.50 ± 10% sched_debug.cpu.nr_uninterruptible.stddev
> 0.00 ± -1% +NaN% 0.00 ± -1% +Inf% 1.01 ±139% +Inf% 6.56 ± 22% perf-profile.cycles-pp.__account_scheduler_latency.enqueue_entity.enqueue_task_fair.activate_task.ttwu_do_activate
> 0.00 ± -1% +Inf% 5.76 ±172% +Inf% 5.79 ±122% +Inf% 16.45 ± 16% perf-profile.cycles-pp.__do_page_fault.do_page_fault.page_fault
> 0.00 ± -1% +Inf% 1.58 ±162% +Inf% 4.57 ±139% +Inf% 3.30 ± 19% perf-profile.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.local_apic_timer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt
> 0.00 ± -1% +Inf% 0.93 ±158% +Inf% 1.05 ±102% +Inf% 3.13 ± 16% perf-profile.cycles-pp.__kernel_text_address.print_context_stack.dump_trace.save_stack_trace_tsk.__account_scheduler_latency
> 0.00 ± -1% +Inf% 0.40 ±159% +Inf% 0.57 ±104% +Inf% 1.18 ± 6% perf-profile.cycles-pp.__schedule.schedule.exit_to_usermode_loop.syscall_return_slowpath.entry_SYSCALL_64_fastpath
> 0.00 ± -1% +Inf% 0.59 ±159% +Inf% 0.83 ±100% +Inf% 2.00 ± 23% perf-profile.cycles-pp.__schedule.schedule.pipe_wait.pipe_write.__vfs_write
> 0.00 ± -1% +Inf% 6.44 ±159% +Inf% 8.27 ±108% +Inf% 19.49 ± 11% perf-profile.cycles-pp.__vfs_read.vfs_read.sys_read.entry_SYSCALL_64_fastpath
> 0.00 ± -1% +Inf% 3.43 ±158% +Inf% 4.53 ±100% +Inf% 11.07 ± 16% perf-profile.cycles-pp.__vfs_write.vfs_write.sys_write.entry_SYSCALL_64_fastpath
> 0.00 ± -1% +Inf% 3.18 ±158% +Inf% 3.39 ±102% +Inf% 9.81 ± 22% perf-profile.cycles-pp.__wake_up_common.__wake_up_sync_key.pipe_read.__vfs_read.vfs_read
> 0.00 ± -1% +Inf% 3.24 ±158% +Inf% 3.44 ±102% +Inf% 10.05 ± 22% perf-profile.cycles-pp.__wake_up_sync_key.pipe_read.__vfs_read.vfs_read.sys_read
> 0.00 ± -1% +NaN% 0.00 ± -1% +Inf% 1.23 ±141% +Inf% 8.04 ± 21% perf-profile.cycles-pp.activate_task.ttwu_do_activate.try_to_wake_up.default_wake_function.autoremove_wake_function
> 0.00 ± -1% +Inf% 0.23 ±166% +Inf% 0.34 ±104% +Inf% 0.81 ± 27% perf-profile.cycles-pp.anon_pipe_buf_release.__vfs_read.vfs_read.sys_read.entry_SYSCALL_64_fastpath
> 0.02 ± 0% +10478.6% 2.12 ±162% +31737.5% 6.37 ±138% +22075.0% 4.43 ± 18% perf-profile.cycles-pp.apic_timer_interrupt
> 0.00 ± -1% +Inf% 3.14 ±158% +Inf% 3.38 ±102% +Inf% 9.79 ± 22% perf-profile.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_sync_key.pipe_read.__vfs_read
> 0.00 ± -1% +Inf% 0.24 ±159% +Inf% 0.51 ±100% +Inf% 1.00 ± 24% perf-profile.cycles-pp.bit_cursor.fb_flashcursor.process_one_work.worker_thread.kthread
> 0.00 ± -1% +NaN% 0.00 ± -1% +Inf% 2.84 ±153% +Inf% 0.85 ± 20% perf-profile.cycles-pp.call_console_drivers.constprop.23.console_unlock.vprintk_emit.vprintk_default.printk
> 27.10 ± 20% -53.0% 12.73 ±105% -100.0% 0.00 ± -1% -100.0% 0.00 ± -1% perf-profile.cycles-pp.call_cpuidle
> 0.00 ± -1% +Inf% 4.85 ±181% +Inf% 15.07 ±118% +Inf% 29.07 ± 9% perf-profile.cycles-pp.call_cpuidle.cpu_startup_entry.start_secondary
> 0.02 ± 19% +5855.6% 1.34 ±175% +5722.2% 1.31 ±107% +14988.9% 3.40 ± 9% perf-profile.cycles-pp.call_function_interrupt
> 0.00 ± -1% +Inf% 0.08 ±244% +Inf% 2.84 ±153% +Inf% 0.85 ± 20% perf-profile.cycles-pp.console_unlock.vprintk_emit.vprintk_default.printk.perf_duration_warn
> 0.00 ± -1% +Inf% 2.23 ±182% +Inf% 2.46 ±131% +Inf% 6.55 ± 31% perf-profile.cycles-pp.copy_page.migrate_misplaced_transhuge_page.do_huge_pmd_numa_page.handle_mm_fault.__do_page_fault
> 0.00 ± -1% +Inf% 2.42 ±159% +Inf% 3.04 ±100% +Inf% 7.95 ± 16% perf-profile.cycles-pp.copy_page_from_iter.pipe_write.__vfs_write.vfs_write.sys_write
> 0.00 ± -1% +Inf% 0.35 ±160% +Inf% 0.49 ±104% +Inf% 0.98 ± 19% perf-profile.cycles-pp.copy_page_from_iter_iovec.copy_page_from_iter.pipe_write.__vfs_write.vfs_write
> 0.00 ± -1% +Inf% 2.25 ±163% +Inf% 3.46 ±114% +Inf% 6.78 ± 17% perf-profile.cycles-pp.copy_page_to_iter.pipe_read.__vfs_read.vfs_read.sys_read
> 0.00 ± -1% +Inf% 2.04 ±159% +Inf% 2.55 ±101% +Inf% 6.82 ± 16% perf-profile.cycles-pp.copy_user_enhanced_fast_string.copy_page_from_iter.pipe_write.__vfs_write.vfs_write
> 0.00 ± -1% +Inf% 2.02 ±160% +Inf% 3.12 ±110% +Inf% 6.31 ± 15% perf-profile.cycles-pp.copy_user_enhanced_fast_string.copy_page_to_iter.pipe_read.__vfs_read.vfs_read
> 28.68 ± 21% -55.2% 12.84 ±105% -100.0% 0.00 ± -1% -100.0% 0.00 ± -1% perf-profile.cycles-pp.cpu_startup_entry
> 0.00 ± -1% +Inf% 4.91 ±181% +Inf% 15.14 ±118% +Inf% 29.36 ± 8% perf-profile.cycles-pp.cpu_startup_entry.start_secondary
> 27.10 ± 20% -53.0% 12.73 ±105% -100.0% 0.00 ± -1% -100.0% 0.00 ± -1% perf-profile.cycles-pp.cpuidle_enter
> 0.00 ± -1% +Inf% 4.85 ±181% +Inf% 15.07 ±118% +Inf% 29.07 ± 9% perf-profile.cycles-pp.cpuidle_enter.call_cpuidle.cpu_startup_entry.start_secondary
> 26.95 ± 20% -53.2% 12.62 ±105% -100.0% 0.00 ± -1% -100.0% 0.00 ± -1% perf-profile.cycles-pp.cpuidle_enter_state
> 0.00 ± -1% +Inf% 4.79 ±181% +Inf% 15.03 ±118% +Inf% 28.79 ± 9% perf-profile.cycles-pp.cpuidle_enter_state.cpuidle_enter.call_cpuidle.cpu_startup_entry.start_secondary
> 0.00 ± -1% +Inf% 0.34 ±158% +Inf% 0.54 ±103% +Inf% 1.14 ± 28% perf-profile.cycles-pp.deactivate_task.__schedule.schedule.pipe_wait.pipe_write
> 0.00 ± -1% +Inf% 3.12 ±158% +Inf% 3.37 ±102% +Inf% 9.75 ± 22% perf-profile.cycles-pp.default_wake_function.autoremove_wake_function.__wake_up_common.__wake_up_sync_key.pipe_read
> 0.00 ± -1% +Inf% 0.29 ±159% +Inf% 0.41 ±100% +Inf% 0.97 ± 30% perf-profile.cycles-pp.dequeue_task_fair.deactivate_task.__schedule.schedule.pipe_wait
> 0.00 ± -1% +NaN% 0.00 ± -1% +Inf% 0.16 ±173% +Inf% 0.88 ± 39% perf-profile.cycles-pp.do_execveat_common.isra.34.sys_execve.do_syscall_64.return_from_SYSCALL_64.execve
> 0.00 ± -1% +Inf% 2.92 ±179% +Inf% 3.08 ±131% +Inf% 8.56 ± 30% perf-profile.cycles-pp.do_huge_pmd_numa_page.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
> 0.00 ± -1% +Inf% 5.78 ±172% +Inf% 5.81 ±122% +Inf% 16.48 ± 16% perf-profile.cycles-pp.do_page_fault.page_fault
> 0.00 ± -1% +Inf% 0.32 ±165% +Inf% 0.17 ±173% +Inf% 0.89 ± 39% perf-profile.cycles-pp.do_syscall_64.return_from_SYSCALL_64.execve
> 0.00 ± -1% +Inf% 1.78 ±158% +Inf% 0.96 ±141% +Inf% 6.24 ± 22% perf-profile.cycles-pp.dump_trace.save_stack_trace_tsk.__account_scheduler_latency.enqueue_entity.enqueue_task_fair
> 0.00 ± -1% +NaN% 0.00 ± -1% +Inf% 1.14 ±140% +Inf% 7.53 ± 22% perf-profile.cycles-pp.enqueue_entity.enqueue_task_fair.activate_task.ttwu_do_activate.try_to_wake_up
> 0.00 ± -1% +NaN% 0.00 ± -1% +Inf% 1.21 ±141% +Inf% 7.86 ± 21% perf-profile.cycles-pp.enqueue_task_fair.activate_task.ttwu_do_activate.try_to_wake_up.default_wake_function
> 0.02 ± 0% +55407.1% 11.10 ±157% +71087.5% 14.24 ±103% +1.7e+05% 34.11 ± 11% perf-profile.cycles-pp.entry_SYSCALL_64_fastpath
> 0.03 ± 47% +1120.8% 0.34 ±155% +509.1% 0.17 ±173% +3127.3% 0.89 ± 39% perf-profile.cycles-pp.execve
> 0.00 ± -1% +Inf% 0.47 ±160% +Inf% 0.64 ±104% +Inf% 1.35 ± 4% perf-profile.cycles-pp.exit_to_usermode_loop.syscall_return_slowpath.entry_SYSCALL_64_fastpath
> 0.00 ± -1% +Inf% 0.24 ±159% +Inf% 0.56 ±101% +Inf% 1.20 ± 28% perf-profile.cycles-pp.fb_flashcursor.process_one_work.worker_thread.kthread.ret_from_fork
> 0.00 ± -1% +Inf% 0.94 ±179% +Inf% 1.05 ±105% +Inf% 2.70 ± 11% perf-profile.cycles-pp.flush_smp_call_function_queue.generic_smp_call_function_single_interrupt.smp_call_function_interrupt.call_function_interrupt
> 0.00 ± -1% +Inf% 0.39 ±180% +Inf% 0.39 ±107% +Inf% 1.25 ± 13% perf-profile.cycles-pp.flush_tlb_func.flush_smp_call_function_queue.generic_smp_call_function_single_interrupt.smp_call_function_interrupt.call_function_interrupt
> 0.00 ± -1% +NaN% 0.00 ± -1% +Inf% 0.95 ±115% +Inf% 2.86 ± 21% perf-profile.cycles-pp.flush_tlb_page.ptep_clear_flush.try_to_unmap_one.rmap_walk_anon.rmap_walk
> 0.00 ± -1% +Inf% 1.12 ±177% +Inf% 1.06 ±104% +Inf% 2.79 ± 11% perf-profile.cycles-pp.generic_smp_call_function_single_interrupt.smp_call_function_interrupt.call_function_interrupt
> 0.00 ± -1% +Inf% 5.49 ±172% +Inf% 5.56 ±123% +Inf% 15.77 ± 17% perf-profile.cycles-pp.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
> 0.00 ± -1% +NaN% 0.00 ± -1% +Inf% 2.08 ±112% +Inf% 6.05 ± 16% perf-profile.cycles-pp.handle_pte_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
> 0.00 ± -1% +Inf% 1.79 ±163% +Inf% 5.26 ±138% +Inf% 3.69 ± 19% perf-profile.cycles-pp.hrtimer_interrupt.local_apic_timer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt
> 0.00 ± -1% +Inf% 0.32 ±159% +Inf% 0.19 ±173% +Inf% 0.91 ± 33% perf-profile.cycles-pp.idle_cpu.select_idle_sibling.select_task_rq_fair.try_to_wake_up.default_wake_function
> 24.41 ± 20% -50.5% 12.09 ±104% -100.0% 0.00 ± -1% -100.0% 0.00 ± -1% perf-profile.cycles-pp.intel_idle
> 0.00 ± -1% +Inf% 3.67 ±234% +Inf% 4.27 ±165% +Inf% 28.93 ± 9% perf-profile.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.call_cpuidle.cpu_startup_entry
> 0.11 ±100% -25.2% 0.08 ±244% +2352.4% 2.57 ±150% +709.5% 0.85 ± 20% perf-profile.cycles-pp.irq_work_interrupt
> 0.00 ± -1% +Inf% 0.08 ±244% +Inf% 2.57 ±150% +Inf% 0.85 ± 20% perf-profile.cycles-pp.irq_work_run.smp_irq_work_interrupt.irq_work_interrupt
> 0.00 ± -1% +Inf% 0.08 ±244% +Inf% 2.57 ±150% +Inf% 0.85 ± 20% perf-profile.cycles-pp.irq_work_run_list.irq_work_run.smp_irq_work_interrupt.irq_work_interrupt
> 0.00 ± -1% +Inf% 0.37 ±160% +Inf% 0.37 ±103% +Inf% 1.33 ± 13% perf-profile.cycles-pp.is_module_text_address.__kernel_text_address.print_context_stack.dump_trace.save_stack_trace_tsk
> 0.00 ± -1% +Inf% 0.44 ±160% +Inf% 0.85 ±100% +Inf% 1.86 ± 14% perf-profile.cycles-pp.kthread.ret_from_fork
> 0.00 ± -1% +Inf% 1.83 ±163% +Inf% 5.35 ±138% +Inf% 3.82 ± 19% perf-profile.cycles-pp.local_apic_timer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt
> 0.00 ± -1% +Inf% 0.24 ±159% +Inf% 0.47 ±100% +Inf% 0.99 ± 25% perf-profile.cycles-pp.memcpy_erms.mga_imageblit.soft_cursor.bit_cursor.fb_flashcursor
> 0.00 ± -1% +Inf% 0.24 ±159% +Inf% 0.51 ±100% +Inf% 1.00 ± 24% perf-profile.cycles-pp.mga_imageblit.soft_cursor.bit_cursor.fb_flashcursor.process_one_work
> 0.00 ± -1% +NaN% 0.00 ± -1% +Inf% 1.51 ±115% +Inf% 4.33 ± 21% perf-profile.cycles-pp.migrate_misplaced_page.handle_pte_fault.handle_mm_fault.__do_page_fault.do_page_fault
> 0.00 ± -1% +Inf% 2.79 ±183% +Inf% 3.05 ±132% +Inf% 8.22 ± 30% perf-profile.cycles-pp.migrate_misplaced_transhuge_page.do_huge_pmd_numa_page.handle_mm_fault.__do_page_fault.do_page_fault
> 0.00 ± -1% +Inf% 0.28 ±244% +Inf% 0.33 ±173% +Inf% 1.07 ± 26% perf-profile.cycles-pp.migrate_page_copy.migrate_misplaced_transhuge_page.do_huge_pmd_numa_page.handle_mm_fault.__do_page_fault
> 0.00 ± -1% +NaN% 0.00 ± -1% +Inf% 1.23 ±116% +Inf% 3.82 ± 20% perf-profile.cycles-pp.migrate_pages.migrate_misplaced_page.handle_pte_fault.handle_mm_fault.__do_page_fault
> 0.40 ±162% -77.9% 0.09 ±154% -100.0% 0.00 ± -1% -100.0% 0.00 ± -1% perf-profile.cycles-pp.mutex_spin_on_owner.isra.4
> 0.00 ± -1% +NaN% 0.00 ± -1% +Inf% 0.94 ±116% +Inf% 2.82 ± 20% perf-profile.cycles-pp.native_flush_tlb_others.flush_tlb_page.ptep_clear_flush.try_to_unmap_one.rmap_walk_anon
> 0.01 ±100% +4914.3% 0.50 ±162% +7425.0% 0.75 ±126% +12500.0% 1.26 ± 17% perf-profile.cycles-pp.native_irq_return_iret
> 0.00 ± -1% +Inf% 0.44 ±188% +Inf% 0.37 ±173% +Inf% 1.20 ± 22% perf-profile.cycles-pp.native_send_call_func_ipi.smp_call_function_many.native_flush_tlb_others.flush_tlb_page.ptep_clear_flush
> 0.02 ± 19% +25709.5% 5.81 ±171% +25733.3% 5.81 ±122% +73266.7% 16.51 ± 16% perf-profile.cycles-pp.page_fault
> 0.00 ± -1% +Inf% 0.08 ±244% +Inf% 2.57 ±150% +Inf% 0.85 ± 20% perf-profile.cycles-pp.perf_duration_warn.irq_work_run_list.irq_work_run.smp_irq_work_interrupt.irq_work_interrupt
> 0.50 ±104% -6.2% 0.47 ±157% -100.0% 0.00 ± -1% -100.0% 0.00 ± -1% perf-profile.cycles-pp.pipe_read
> 0.00 ± -1% +Inf% 6.10 ±159% +Inf% 7.73 ±109% +Inf% 18.29 ± 11% perf-profile.cycles-pp.pipe_read.__vfs_read.vfs_read.sys_read.entry_SYSCALL_64_fastpath
> 0.00 ± -1% +Inf% 0.80 ±158% +Inf% 1.03 ±100% +Inf% 2.40 ± 24% perf-profile.cycles-pp.pipe_wait.pipe_write.__vfs_write.vfs_write.sys_write
> 0.00 ± -1% +Inf% 3.41 ±158% +Inf% 2.56 ±147% +Inf% 11.80 ± 18% perf-profile.cycles-pp.pipe_write.__vfs_write.vfs_write.sys_write.entry_SYSCALL_64_fastpath
> 2.49 ± 40% -79.0% 0.52 ±151% -100.0% 0.00 ± -1% -100.0% 0.00 ± -1% perf-profile.cycles-pp.poll_idle
> 0.00 ± -1% +Inf% 1.69 ±158% +Inf% 0.87 ±139% +Inf% 5.77 ± 22% perf-profile.cycles-pp.print_context_stack.dump_trace.save_stack_trace_tsk.__account_scheduler_latency.enqueue_entity
> 0.00 ± -1% +Inf% 0.08 ±244% +Inf% 2.57 ±150% +Inf% 0.85 ± 20% perf-profile.cycles-pp.printk.perf_duration_warn.irq_work_run_list.irq_work_run.smp_irq_work_interrupt
> 0.00 ± -1% +Inf% 0.25 ±160% +Inf% 0.60 ±100% +Inf% 1.21 ± 27% perf-profile.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork
> 0.00 ± -1% +NaN% 0.00 ± -1% +Inf% 0.95 ±115% +Inf% 2.87 ± 21% perf-profile.cycles-pp.ptep_clear_flush.try_to_unmap_one.rmap_walk_anon.rmap_walk.try_to_unmap
> 0.02 ± 0% +807.1% 0.18 ±209% +1700.0% 0.36 ±102% +4900.0% 1.00 ± 29% perf-profile.cycles-pp.read
> 1.11 ± 61% -97.6% 0.03 ±216% -100.0% 0.00 ± -1% -100.0% 0.00 ± -1% perf-profile.cycles-pp.rest_init
> 0.02 ± 24% +2479.6% 0.45 ±153% +4742.9% 0.85 ±100% +10542.9% 1.86 ± 14% perf-profile.cycles-pp.ret_from_fork
> 0.00 ± -1% +Inf% 0.32 ±165% +Inf% 0.17 ±173% +Inf% 0.89 ± 39% perf-profile.cycles-pp.return_from_SYSCALL_64.execve
> 0.00 ± -1% +NaN% 0.00 ± -1% +Inf% 0.98 ±116% +Inf% 2.97 ± 23% perf-profile.cycles-pp.rmap_walk.try_to_unmap.migrate_pages.migrate_misplaced_page.handle_pte_fault
> 0.00 ± -1% +NaN% 0.00 ± -1% +Inf% 0.97 ±115% +Inf% 2.97 ± 23% perf-profile.cycles-pp.rmap_walk_anon.rmap_walk.try_to_unmap.migrate_pages.migrate_misplaced_page
> 0.00 ± -1% +Inf% 1.79 ±158% +Inf% 0.97 ±139% +Inf% 6.27 ± 22% perf-profile.cycles-pp.save_stack_trace_tsk.__account_scheduler_latency.enqueue_entity.enqueue_task_fair.activate_task
> 0.00 ± -1% +Inf% 0.43 ±159% +Inf% 0.61 ±103% +Inf% 1.27 ± 3% perf-profile.cycles-pp.schedule.exit_to_usermode_loop.syscall_return_slowpath.entry_SYSCALL_64_fastpath
> 0.00 ± -1% +Inf% 0.60 ±159% +Inf% 0.88 ±100% +Inf% 2.06 ± 22% perf-profile.cycles-pp.schedule.pipe_wait.pipe_write.__vfs_write.vfs_write
> 0.00 ± -1% +NaN% 0.00 ± -1% +Inf% 2.38 ±136% +Inf% 1.74 ± 20% perf-profile.cycles-pp.scheduler_tick.update_process_times.tick_sched_handle.isra.17.tick_sched_timer.__hrtimer_run_queues
> 0.00 ± -1% +Inf% 0.42 ±159% +Inf% 0.42 ±113% +Inf% 1.28 ± 28% perf-profile.cycles-pp.select_idle_sibling.select_task_rq_fair.try_to_wake_up.default_wake_function.autoremove_wake_function
> 0.00 ± -1% +Inf% 0.51 ±158% +Inf% 0.53 ±109% +Inf% 1.54 ± 27% perf-profile.cycles-pp.select_task_rq_fair.try_to_wake_up.default_wake_function.autoremove_wake_function.__wake_up_common
> 0.00 ± -1% +NaN% 0.00 ± -1% +Inf% 2.69 ±152% +Inf% 0.84 ± 19% perf-profile.cycles-pp.serial8250_console_putchar.uart_console_write.serial8250_console_write.univ8250_console_write.call_console_drivers.constprop.23
> 0.00 ± -1% +NaN% 0.00 ± -1% +Inf% 2.69 ±152% +Inf% 0.84 ± 19% perf-profile.cycles-pp.serial8250_console_write.univ8250_console_write.call_console_drivers.constprop.23.console_unlock.vprintk_emit
> 0.00 ± -1% +Inf% 2.04 ±163% +Inf% 6.27 ±138% +Inf% 4.34 ± 18% perf-profile.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt
> 0.00 ± -1% +Inf% 1.28 ±178% +Inf% 1.21 ±105% +Inf% 3.17 ± 8% perf-profile.cycles-pp.smp_call_function_interrupt.call_function_interrupt
> 0.00 ± -1% +Inf% 0.93 ±177% +Inf% 0.94 ±116% +Inf% 2.80 ± 19% perf-profile.cycles-pp.smp_call_function_many.native_flush_tlb_others.flush_tlb_page.ptep_clear_flush.try_to_unmap_one
> 0.00 ± -1% +Inf% 0.08 ±244% +Inf% 2.57 ±150% +Inf% 0.85 ± 20% perf-profile.cycles-pp.smp_irq_work_interrupt.irq_work_interrupt
> 0.00 ± -1% +Inf% 0.24 ±159% +Inf% 0.51 ±100% +Inf% 1.00 ± 24% perf-profile.cycles-pp.soft_cursor.bit_cursor.fb_flashcursor.process_one_work.worker_thread
> 1.11 ± 61% -97.6% 0.03 ±216% -100.0% 0.00 ± -1% -100.0% 0.00 ± -1% perf-profile.cycles-pp.start_kernel
> 0.00 ± -1% +Inf% 0.32 ±165% +Inf% 0.17 ±173% +Inf% 0.89 ± 39% perf-profile.cycles-pp.sys_execve.do_syscall_64.return_from_SYSCALL_64.execve
> 0.00 ± -1% +Inf% 6.82 ±159% +Inf% 8.52 ±108% +Inf% 20.43 ± 11% perf-profile.cycles-pp.sys_read.entry_SYSCALL_64_fastpath
> 0.00 ± -1% +Inf% 3.45 ±158% +Inf% 4.57 ±100% +Inf% 11.18 ± 16% perf-profile.cycles-pp.sys_write.entry_SYSCALL_64_fastpath
> 0.00 ± -1% +Inf% 0.49 ±159% +Inf% 0.64 ±103% +Inf% 1.44 ± 5% perf-profile.cycles-pp.syscall_return_slowpath.entry_SYSCALL_64_fastpath
> 0.00 ± -1% +NaN% 0.00 ± -1% +Inf% 1.70 ±141% +Inf% 1.16 ± 25% perf-profile.cycles-pp.task_tick_fair.scheduler_tick.update_process_times.tick_sched_handle.isra.17.tick_sched_timer
> 0.00 ± -1% +NaN% 0.00 ± -1% +Inf% 3.39 ±134% +Inf% 2.78 ± 22% perf-profile.cycles-pp.tick_sched_handle.isra.17.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.local_apic_timer_interrupt
> 0.00 ± -1% +Inf% 1.35 ±162% +Inf% 3.59 ±135% +Inf% 2.89 ± 22% perf-profile.cycles-pp.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.local_apic_timer_interrupt.smp_apic_timer_interrupt
> 0.00 ± -1% +NaN% 0.00 ± -1% +Inf% 0.98 ±116% +Inf% 3.00 ± 23% perf-profile.cycles-pp.try_to_unmap.migrate_pages.migrate_misplaced_page.handle_pte_fault.handle_mm_fault
> 0.00 ± -1% +NaN% 0.00 ± -1% +Inf% 0.95 ±115% +Inf% 2.94 ± 23% perf-profile.cycles-pp.try_to_unmap_one.rmap_walk_anon.rmap_walk.try_to_unmap.migrate_pages
> 0.00 ± -1% +Inf% 3.11 ±158% +Inf% 1.66 ±143% +Inf% 10.32 ± 21% perf-profile.cycles-pp.try_to_wake_up.default_wake_function.autoremove_wake_function.__wake_up_common.__wake_up_sync_key
> 0.00 ± -1% +NaN% 0.00 ± -1% +Inf% 1.33 ±143% +Inf% 8.29 ± 21% perf-profile.cycles-pp.ttwu_do_activate.try_to_wake_up.default_wake_function.autoremove_wake_function.__wake_up_common
> 0.00 ± -1% +NaN% 0.00 ± -1% +Inf% 2.69 ±152% +Inf% 0.84 ± 19% perf-profile.cycles-pp.uart_console_write.serial8250_console_write.univ8250_console_write.call_console_drivers.constprop.23.console_unlock
> 0.00 ± -1% +NaN% 0.00 ± -1% +Inf% 2.69 ±152% +Inf% 0.84 ± 19% perf-profile.cycles-pp.univ8250_console_write.call_console_drivers.constprop.23.console_unlock.vprintk_emit.vprintk_default
> 0.00 ± -1% +NaN% 0.00 ± -1% +Inf% 3.30 ±133% +Inf% 2.70 ± 22% perf-profile.cycles-pp.update_process_times.tick_sched_handle.isra.17.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt
> 0.00 ± -1% +Inf% 6.74 ±159% +Inf% 8.48 ±108% +Inf% 20.19 ± 11% perf-profile.cycles-pp.vfs_read.sys_read.entry_SYSCALL_64_fastpath
> 0.00 ± -1% +Inf% 3.45 ±158% +Inf% 4.55 ±100% +Inf% 11.15 ± 16% perf-profile.cycles-pp.vfs_write.sys_write.entry_SYSCALL_64_fastpath
> 0.00 ± -1% +Inf% 0.08 ±244% +Inf% 2.84 ±153% +Inf% 0.85 ± 20% perf-profile.cycles-pp.vprintk_default.printk.perf_duration_warn.irq_work_run_list.irq_work_run
> 0.00 ± -1% +Inf% 0.08 ±244% +Inf% 2.84 ±153% +Inf% 0.85 ± 20% perf-profile.cycles-pp.vprintk_emit.vprintk_default.printk.perf_duration_warn.irq_work_run_list
> 0.00 ± -1% +Inf% 0.07 ±244% +Inf% 2.59 ±152% +Inf% 0.81 ± 19% perf-profile.cycles-pp.wait_for_xmitr.serial8250_console_putchar.uart_console_write.serial8250_console_write.univ8250_console_write
> 0.00 ± -1% +Inf% 0.25 ±160% +Inf% 0.61 ±100% +Inf% 1.23 ± 26% perf-profile.cycles-pp.worker_thread.kthread.ret_from_fork
> 1.11 ± 61% -85.6% 0.16 ±199% -87.8% 0.14 ±173% -85.8% 0.16 ±173% perf-profile.cycles-pp.x86_64_start_kernel
> 1.11 ± 61% -97.6% 0.03 ±216% -100.0% 0.00 ± -1% -100.0% 0.00 ± -1% perf-profile.cycles-pp.x86_64_start_reservations

The main increases that stick out to me are in the read() from the
pipe (both in the copy as well as increased wakeups to the writer),
and increased NUMA balancing activity (page faults and migrations).

If NUMA balancing doesn't settle, the NUMA page faults can reduce
throughput of the single ruby script that writes input data to the
pixz pipe, which in turn will no longer saturate the 48 compression
threads and so explain the increased incidence of hitting an empty
pipe and issuing a wakeup.

But why would NUMA balancing be impacted after this patch? The only
place where NUMA balancing uses the watermark directly is to determine
whether it can migrate toward a specific node (and indirectly during
allocation of a huge page). But your NUMA nodes shouldn't be nearly
full: when I run pixz with 48 threads, it consumes ~600MB of memory.
Your nodes have 33G each. Surely, it should always find the free
memory to be plenty, even after the patch raised the watermarks? A
difference in THP success rate isn't indicated in stats, either.

To be sure, this is a minimal test system with nothing else running,
right?

Could you please collect periodic snapshots of /proc/zoneinfo while
the pixz test is running? Something like this would be great:

while sleep 1; do cat /proc/zoneinfo >> zoneinfo.log; done

(both on last good and first bad)

Thanks