Re: [lkp] [ipc/sem.c] 0882cba0a0: aim9.shared_memory.ops_per_sec -8.8% regression

From: Ye Xiaolong
Date: Mon Oct 10 2016 - 02:26:57 EST


On 10/09, Manfred Spraul wrote:
>Hi,
>
>On 10/09/2016 09:05 AM, kernel test robot wrote:
>>FYI, we noticed a -8.8% regression of aim9.shared_memory.ops_per_sec due to commit:
>>
>>commit 0882cba0a03bca73acd8fab8fb50db04691908e9 ("ipc/sem.c: fix complex_count vs. simple op race")
>>https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
>>
>>in testcase: aim9
>>on test machine: 4 threads Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz with 4G memory
>>with following parameters:
>>
>> testtime: 300s
>> test: shared_memory
>> cpufreq_governor: performance
>>
>>Suite IX is the "AIM Independent Resource Benchmark:" the famous synthetic benchmark.
>>
>>
>It is probably caused by this change:
>>--- ipc/sem-fast-but-wrong.c 2016-10-09 19:24:47.914825410 +0200
>>+++ ipc/sem.c 2016-10-09 19:24:57.960841540 +0200
>>@@ -363,6 +363,14 @@
>> */
>> spin_lock(&sem->lock);
>>
>>+ /*
>>+ * See 51d7d5205d33
>>+ * ("powerpc: Add smp_mb() to arch_spin_is_locked()"):
>>+ * A full barrier is required: the write of sem->lock
>>+ * must be visible before the read is executed
>>+ */
>>+ smp_mb();
>>+
>> if (!smp_load_acquire(&sma->complex_mode)) {
>> /* fast path successful! */
>> return sops->sem_num;
>Unfortunately, we need it, at least for powerpc.
>And I do not want to add a CONFIG_PPC into ipc/sem.c.
>
>Is it possible to do a test what happens with patch that avoids
>spin_unlock_wait()?
>
>https://patchwork.kernel.org/patch/9359365/
>https://patchwork.kernel.org/patch/9359371/
>
>(if possible, with both patches together)

I applied these two patches on top of 0882cba ("ipc/sem.c: fix
complex_count vs. simple op race"), seems they do help recover the
performance back, here is the comparison:


=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
gcc-6/performance/x86_64-rhel-7.2/debian-x86_64-2016-08-31.cgz/lkp-ivb-d03/shared_memory/aim9/300s

commit:
bfe738a6c274df47650d09360babc8c1b71ff2a8
0882cba0a03bca73acd8fab8fb50db04691908e9
cc5f1c2b5da85fdf3de370189765af88c8df7b55 ("ipc/sem: Add hysteresis")

bfe738a6c274df47 0882cba0a03bca73acd8fab8fb cc5f1c2b5da85fdf3de3701897
---------------- -------------------------- --------------------------
fail:runs %reproduction fail:runs %reproduction fail:runs
| | | | |
%stddev %change %stddev %change %stddev
\ | \ | \
1377511 ± 0% -8.8% 1256420 ± 0% -0.5% 1370192 ± 0% aim9.shared_memory.ops_per_sec
4133531 ± 0% -8.8% 3770259 ± 0% -0.5% 4111578 ± 0% aim9.time.minor_page_faults
8106 ± 6% -7.7% 7483 ± 8% +24.5% 10092 ± 7% cpuidle.POLL.usage
9.66 ± 0% -2.0% 9.46 ± 0% +0.5% 9.71 ± 0% turbostat.CorWatt
2374 ± 5% -12.4% 2078 ± 1% -5.5% 2243 ± 5% slabinfo.kmalloc-256.active_objs
697.75 ± 4% +33.5% 931.25 ± 5% +43.5% 1001 ± 4% slabinfo.kmalloc-512.active_objs
719.50 ± 3% +32.3% 951.75 ± 5% +41.8% 1020 ± 4% slabinfo.kmalloc-512.num_objs
5184 ± 2% -12.3% 4544 ± 8% -5.9% 4877 ± 8% slabinfo.kmalloc-64.active_objs
5184 ± 2% -12.3% 4544 ± 8% -5.9% 4877 ± 8% slabinfo.kmalloc-64.num_objs
2.124e+09 ± 3% +0.9% 2.144e+09 ± 9% +14.8% 2.439e+09 ± 9% perf-stat.cache-references
10295 ± 6% -8.9% 9376 ± 1% +6.0% 10912 ± 7% perf-stat.cpu-migrations
0.04 ± 24% +59.5% 0.06 ± 51% +508.5% 0.24 ±123% perf-stat.dTLB-load-miss-rate%
5.346e+11 ± 3% -13.1% 4.647e+11 ± 7% +2.9% 5.501e+11 ± 8% perf-stat.dTLB-loads
1.61 ± 1% -9.2% 1.46 ± 1% -0.6% 1.60 ± 1% perf-stat.ipc
4417053 ± 0% -8.2% 4053763 ± 0% -0.5% 4394973 ± 0% perf-stat.minor-faults
4417054 ± 0% -8.2% 4053763 ± 0% -0.5% 4394973 ± 0% perf-stat.page-faults
4478439 ±144% +83.7% 8227939 ± 90% -87.4% 564387 ± 21% sched_debug.cfs_rq:/.load.max
1912666 ±145% +84.7% 3531796 ± 91% -87.7% 235296 ± 19% sched_debug.cfs_rq:/.load.stddev
755.19 ± 79% +4.3% 787.48 ± 63% -55.2% 338.70 ± 15% sched_debug.cfs_rq:/.load_avg.avg
186.08 ± 9% -10.1% 167.38 ± 15% -39.9% 111.77 ± 35% sched_debug.cfs_rq:/.load_avg.min
86.83 ± 22% -8.6% 79.38 ± 31% -38.0% 53.83 ± 27% sched_debug.cfs_rq:/.util_avg.min
222488 ± 2% +0.2% 222984 ± 6% -22.8% 171691 ± 15% sched_debug.cpu.avg_idle.stddev
62.58 ± 56% -18.7% 50.90 ± 41% -93.8% 3.86 ±168% sched_debug.cpu.clock.stddev
62.58 ± 56% -18.7% 50.90 ± 41% -93.8% 3.86 ±168% sched_debug.cpu.clock_task.stddev
4500146 ±144% +82.7% 8220704 ± 92% -87.5% 564387 ± 21% sched_debug.cpu.load.max
1920553 ±145% +83.6% 3525404 ± 92% -87.7% 235296 ± 19% sched_debug.cpu.load.stddev
1.52 ± 11% -26.7% 1.11 ± 33% -24.9% 1.14 ± 18% sched_debug.cpu.nr_running.avg
18404 ± 42% +39.9% 25755 ± 11% +66.9% 30725 ± 7% sched_debug.cpu.ttwu_count.min
7.90 ± 26% -30.4% 5.50 ± 37% -13.0% 6.88 ± 45% sched_debug.rt_rq:/.rt_time.avg
31.58 ± 26% -30.4% 21.97 ± 37% -13.2% 27.42 ± 45% sched_debug.rt_rq:/.rt_time.max
13.67 ± 26% -30.5% 9.51 ± 37% -13.2% 11.86 ± 45% sched_debug.rt_rq:/.rt_time.stddev
3.17 ± 9% -26.7% 2.32 ± 13% +2.0% 3.23 ± 12% perf-profile.calltrace.cycles-pp._raw_spin_lock.sys_semop.entry_SYSCALL_64_fastpath
1.16 ± 4% -15.9% 0.98 ± 14% +1.0% 1.17 ± 11% perf-profile.calltrace.cycles-pp.do_munmap.sys_shmdt.entry_SYSCALL_64_fastpath
5.25 ± 8% -15.7% 4.43 ± 4% -2.3% 5.13 ± 15% perf-profile.calltrace.cycles-pp.do_smart_update.SYSC_semtimedop.sys_semop.entry_SYSCALL_64_fastpath
1.50 ± 5% +68.6% 2.53 ± 10% -36.2% 0.96 ± 28% perf-profile.calltrace.cycles-pp.pid_vnr.SYSC_semtimedop.sys_semop.entry_SYSCALL_64_fastpath
1.24 ± 4% -15.3% 1.05 ± 12% +0.6% 1.25 ± 11% perf-profile.calltrace.cycles-pp.sys_shmdt.entry_SYSCALL_64_fastpath
0.99 ± 12% -20.5% 0.79 ± 6% -5.7% 0.93 ± 17% perf-profile.calltrace.cycles-pp.wake_const_ops.do_smart_wakeup_zero.do_smart_update.SYSC_semtimedop.sys_semop
3.45 ± 9% -24.8% 2.60 ± 12% +1.7% 3.51 ± 13% perf-profile.children.cycles-pp._raw_spin_lock
1.18 ± 4% -15.7% 0.99 ± 14% +0.9% 1.19 ± 12% perf-profile.children.cycles-pp.do_munmap
5.61 ± 8% -15.4% 4.75 ± 3% -3.7% 5.40 ± 15% perf-profile.children.cycles-pp.do_smart_update
1.78 ± 5% +56.7% 2.80 ± 11% -29.1% 1.27 ± 22% perf-profile.children.cycles-pp.pid_vnr
1.24 ± 4% -15.3% 1.05 ± 12% +0.8% 1.25 ± 11% perf-profile.children.cycles-pp.sys_shmdt
1.24 ± 13% -24.7% 0.93 ± 8% -12.7% 1.08 ± 16% perf-profile.children.cycles-pp.wake_const_ops
9.86 ± 8% +47.0% 14.49 ± 9% -8.5% 9.02 ± 13% perf-profile.self.cycles-pp.SYSC_semtimedop
3.45 ± 9% -25.2% 2.58 ± 13% +1.9% 3.51 ± 13% perf-profile.self.cycles-pp._raw_spin_lock
2.26 ± 9% -26.8% 1.65 ± 7% -11.0% 2.01 ± 18% perf-profile.self.cycles-pp.do_smart_update
1.78 ± 5% +56.7% 2.80 ± 11% -29.1% 1.27 ± 22% perf-profile.self.cycles-pp.pid_vnr
1.03 ± 2% -15.3% 0.87 ± 8% -8.7% 0.94 ± 25% perf-profile.self.cycles-pp.security_ipc_permission
1.32 ± 5% -4.0% 1.27 ± 14% +14.8% 1.51 ± 8% perf-profile.self.cycles-pp.security_sem_semop
1.24 ± 13% -24.7% 0.93 ± 8% -12.7% 1.08 ± 16% perf-profile.self.cycles-pp.wake_const_ops

Thanks,
Xiaolong
>
>Thanks,
> Manfred