Re: [smp] a32a4d8a81: netperf.Throughput_tps -2.1% regression
From: Nadav Amit
Date: Wed May 19 2021 - 14:17:53 EST
[ +PeterZ for reference ]
> On May 19, 2021, at 7:27 AM, kernel test robot <oliver.sang@xxxxxxxxx> wrote:
>
>
>
> Greeting,
>
> FYI, we noticed a -2.1% regression of netperf.Throughput_tps due to commit:
>
>
> commit: a32a4d8a815c4eb6dc64b8962dc13a9dfae70868 ("smp: Run functions concurrently in smp_call_function_many_cond()")
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fcgit%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git&data=04%7C01%7Cnamit%40vmware.com%7Ca49b22e928144aab039908d91acff8c4%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637570302823256266%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=h2VRetBNlEQBvOlkYrRCMCK6%2BukRqlCElYxM8UfVxqI%3D&reserved=0 master
>
>
> in testcase: netperf
> on test machine: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
> with following parameters:
>
> ip: ipv4
> runtime: 300s
> nr_threads: 1
> cluster: cs-localhost
> test: UDP_RR
> cpufreq_governor: performance
> ucode: 0x5003006
>
>
[snip]
> commit:
> v5.12-rc2
> a32a4d8a81 ("smp: Run functions concurrently in smp_call_function_many_cond()")
>
> v5.12-rc2 a32a4d8a815c4eb6dc64b8962dc
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 116903 -2.1% 114404 netperf.Throughput_total_tps
> 116903 -2.1% 114404 netperf.Throughput_tps
> 35066769 -2.1% 34317990 netperf.time.voluntary_context_switches
> 35071059 -2.1% 34321258 netperf.workload
> 67295 +1.5% 68333 proc-vmstat.nr_anon_pages
> 463520 -2.1% 453603 vmstat.system.cs
> 535.28 ± 6% -8.3% 490.97 ± 10% sched_debug.cfs_rq:/.util_est_enqueued.max
> 0.02 ± 8% -10.8% 0.02 ± 4% sched_debug.cpu.nr_running.avg
> 76309820 ± 4% +320.0% 3.205e+08 ±158% cpuidle.C1.time
> 23409116 ± 3% +31.0% 30676822 ± 20% cpuidle.C1.usage
> 46720133 ± 2% -12.9% 40709940 ± 2% cpuidle.POLL.usage
> 5282 ±110% +317.0% 22029 ± 58% numa-vmstat.node3.nr_anon_pages
> 11998 ± 55% +138.7% 28637 ± 45% numa-vmstat.node3.nr_inactive_anon
> 11998 ± 55% +138.7% 28637 ± 45% numa-vmstat.node3.nr_zone_inactive_anon
> 8397 ±136% +588.7% 57827 ± 75% numa-meminfo.node3.AnonHugePages
> 21162 ±110% +316.7% 88189 ± 58% numa-meminfo.node3.AnonPages
> 48780 ± 54% +136.8% 115533 ± 45% numa-meminfo.node3.Inactive
> 48780 ± 54% +136.8% 115533 ± 45% numa-meminfo.node3.Inactive(anon)
> 467040 -2.1% 457094 perf-stat.i.context-switches
> 0.01 ±138% +0.0 0.03 ± 73% perf-stat.i.dTLB-store-miss-rate%
> 9.415e+08 -2.4% 9.188e+08 ± 2% perf-stat.i.dTLB-stores
> 0.01 ±137% +0.0 0.03 ± 73% perf-stat.overall.dTLB-store-miss-rate%
> 465472 -2.1% 455557 perf-stat.ps.context-switches
> 9.385e+08 -2.4% 9.158e+08 ± 2% perf-stat.ps.dTLB-stores
> 1.21 ± 14% +0.2 1.41 ± 5% perf-profile.calltrace.cycles-pp.__ip_append_data.ip_make_skb.udp_sendmsg.sock_sendmsg.__sys_sendto
> 2.05 ± 10% +0.3 2.33 ± 4% perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
> 0.06 ± 7% +0.0 0.08 ± 14% perf-profile.children.cycles-pp.__calc_delta
> 0.08 ± 19% +0.0 0.10 ± 9% perf-profile.children.cycles-pp._copy_to_user
> 0.09 ± 22% +0.0 0.12 ± 8% perf-profile.children.cycles-pp._copy_from_user
> 0.12 ± 20% +0.0 0.17 ± 13% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
> 0.14 ± 11% +0.1 0.19 ± 9% perf-profile.children.cycles-pp.skb_release_data
> 1.21 ± 14% +0.2 1.41 ± 5% perf-profile.children.cycles-pp.__ip_append_data
> 2.07 ± 11% +0.3 2.33 ± 4% perf-profile.children.cycles-pp.schedule_idle
> 0.06 ± 7% +0.0 0.08 ± 11% perf-profile.self.cycles-pp.__calc_delta
> 0.19 ± 8% +0.0 0.24 ± 6% perf-profile.self.cycles-pp.__softirqentry_text_start
> 0.24 ± 8% +0.1 0.29 ± 4% perf-profile.self.cycles-pp.__skb_recv_udp
> 0.14 ± 11% +0.1 0.19 ± 9% perf-profile.self.cycles-pp.skb_release_data
> 0.02 ±142% +0.1 0.08 ± 17% perf-profile.self.cycles-pp.sock_alloc_send_pskb
> 0.11 ± 17% +0.1 0.19 ± 13% perf-profile.self.cycles-pp.__ip_append_data
> 0.12 ± 34% +0.1 0.26 ± 22% perf-profile.self.cycles-pp.perf_mux_hrtimer_handler
> 0.87 ± 13% +0.2 1.05 ± 6% perf-profile.self.cycles-pp._raw_spin_lock
> 1287 ± 42% +75.3% 2256 ± 14% interrupts.CPU111.CAL:Function_call_interrupts
> 1326 ± 43% +71.0% 2267 ± 13% interrupts.CPU119.CAL:Function_call_interrupts
> 1300 ± 45% +75.9% 2287 ± 37% interrupts.CPU120.CAL:Function_call_interrupts
> 1299 ± 45% +60.1% 2081 ± 28% interrupts.CPU128.CAL:Function_call_interrupts
> 1305 ± 45% +61.7% 2110 ± 29% interrupts.CPU131.CAL:Function_call_interrupts
> 1299 ± 45% +61.8% 2102 ± 28% interrupts.CPU139.CAL:Function_call_interrupts
> 66.67 ±133% -97.2% 1.83 ±155% interrupts.CPU14.TLB:TLB_shootdowns
> 1299 ± 45% +107.8% 2700 ± 33% interrupts.CPU142.CAL:Function_call_interrupts
> 301.83 ±128% -95.6% 13.17 ±140% interrupts.CPU149.RES:Rescheduling_interrupts
> 389.17 ± 89% -73.5% 103.17 ± 35% interrupts.CPU164.NMI:Non-maskable_interrupts
> 389.17 ± 89% -73.5% 103.17 ± 35% interrupts.CPU164.PMI:Performance_monitoring_interrupts
> 1299 ± 45% +60.2% 2081 ± 28% interrupts.CPU35.CAL:Function_call_interrupts
> 1244 ± 50% +66.8% 2076 ± 27% interrupts.CPU45.CAL:Function_call_interrupts
> 1300 ± 44% +59.5% 2075 ± 28% interrupts.CPU46.CAL:Function_call_interrupts
> 1.50 ± 63% +1422.2% 22.83 ±167% interrupts.CPU47.RES:Rescheduling_interrupts
> 467.33 ± 85% -64.6% 165.67 ± 74% interrupts.CPU58.NMI:Non-maskable_interrupts
> 467.33 ± 85% -64.6% 165.67 ± 74% interrupts.CPU58.PMI:Performance_monitoring_interrupts
> 306.67 ± 75% -59.9% 122.83 ± 16% interrupts.CPU68.NMI:Non-maskable_interrupts
> 306.67 ± 75% -59.9% 122.83 ± 16% interrupts.CPU68.PMI:Performance_monitoring_interrupts
> 1131 ± 27% +61.2% 1822 ± 35% interrupts.CPU85.CAL:Function_call_interrupts
> 1180 ± 31% +79.6% 2119 ± 24% interrupts.CPU86.CAL:Function_call_interrupts
>
Could it be a result of a regression that was resolved by commit
641acbf6fd6 ("smp: Micro-optimize smp_call_function_many_cond()")
or does this report mean that the performance regression also
happened on the -rc?
Attachment:
signature.asc
Description: Message signed with OpenPGP