[linus:master] [workqueue] fe089f87cc: fxmark.ssd_xfs_dbench_client_72_bufferedio.works/sec -32.0% regression

From: kernel test robot
Date: Mon Sep 18 2023 - 11:43:36 EST




Hello,

kernel test robot noticed a -32.0% regression of fxmark.ssd_xfs_dbench_client_72_bufferedio.works/sec on:


commit: fe089f87cccb066e8ad20f49ddf05e95adc1fa8d ("workqueue: Not all work insertion needs to wake up a worker")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: fxmark
test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
parameters:

disk: 1SSD
media: ssd
test: dbench_client
fstype: xfs
directio: bufferedio
cpufreq_governor: performance




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
| Closes: https://lore.kernel.org/oe-lkp/202309182202.4bab967a-oliver.sang@xxxxxxxxx


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230918/202309182202.4bab967a-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/directio/disk/fstype/kconfig/media/rootfs/tbox_group/test/testcase:
gcc-12/performance/bufferedio/1SSD/xfs/x86_64-rhel-8.3/ssd/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp5/dbench_client/fxmark

commit:
c0ab017d43 ("workqueue: Cleanups around process_scheduled_works()")
fe089f87cc ("workqueue: Not all work insertion needs to wake up a worker")

c0ab017d43f4c414 fe089f87cccb066e8ad20f49ddf
---------------- ---------------------------
%stddev %change %stddev
\ | \
39209971 -10.4% 35133502 ± 2% cpuidle..usage
52.91 +5.1% 55.59 iostat.cpu.idle
22.75 -9.7% 20.56 ± 2% iostat.cpu.iowait
10466069 ± 2% -29.5% 7376773 ± 11% numa-numastat.node1.local_node
10510900 ± 2% -28.8% 7480808 ± 11% numa-numastat.node1.numa_hit
1179 ± 15% -22.0% 919.83 ± 5% perf-c2c.DRAM.remote
1145 ± 14% -29.3% 809.67 ± 9% perf-c2c.HITM.local
46807 ± 21% -45.8% 25346 ± 50% numa-meminfo.node0.Active(anon)
504904 ± 25% -45.3% 276367 ± 47% numa-meminfo.node0.Shmem
66660 ± 8% -33.3% 44451 ± 18% numa-meminfo.node1.Dirty
-828.44 -61.2% -321.18 sched_debug.cpu.nr_uninterruptible.avg
-5154 -58.8% -2121 sched_debug.cpu.nr_uninterruptible.min
892.09 ± 6% -49.8% 447.61 ± 21% sched_debug.cpu.nr_uninterruptible.stddev
142456 ± 2% -7.6% 131656 ± 3% vmstat.io.bo
124966 ± 2% -20.0% 99939 ± 4% vmstat.system.cs
51184 -8.8% 46683 ± 2% vmstat.system.in
124596 ± 2% -19.9% 99792 ± 4% perf-stat.i.context-switches
3454 ± 2% -31.2% 2377 ± 10% perf-stat.i.cpu-migrations
124479 ± 2% -19.9% 99650 ± 4% perf-stat.ps.context-switches
3453 ± 2% -31.2% 2374 ± 10% perf-stat.ps.cpu-migrations
131582 ± 15% -23.7% 100437 ± 14% meminfo.Active
59138 ± 34% -52.4% 28159 ± 49% meminfo.Active(anon)
1088512 ± 22% -30.6% 755813 ± 19% meminfo.Committed_AS
380805 -11.5% 337183 ± 3% meminfo.Dirty
1235425 ± 18% -24.4% 933365 ± 14% meminfo.Inactive
902090 ± 24% -33.8% 596946 ± 21% meminfo.Inactive(anon)
642365 ± 37% -52.0% 308283 ± 46% meminfo.Shmem
11701 ± 21% -45.9% 6336 ± 50% numa-vmstat.node0.nr_active_anon
126226 ± 25% -45.3% 69093 ± 47% numa-vmstat.node0.nr_shmem
11701 ± 21% -45.9% 6336 ± 50% numa-vmstat.node0.nr_zone_active_anon
10617306 ± 2% -29.5% 7484793 ± 12% numa-vmstat.node1.nr_dirtied
16646 ± 8% -33.2% 11111 ± 18% numa-vmstat.node1.nr_dirty
2570325 ± 2% -28.0% 1849961 ± 12% numa-vmstat.node1.nr_written
16619 ± 8% -33.2% 11093 ± 18% numa-vmstat.node1.nr_zone_write_pending
10510971 ± 2% -28.8% 7480700 ± 11% numa-vmstat.node1.numa_hit
10466140 ± 2% -29.5% 7376665 ± 11% numa-vmstat.node1.numa_local
14789 ± 34% -52.4% 7043 ± 48% proc-vmstat.nr_active_anon
73542893 -12.1% 64665529 ± 4% proc-vmstat.nr_dirtied
95176 -11.2% 84481 ± 3% proc-vmstat.nr_dirty
949397 ± 6% -8.7% 866798 ± 4% proc-vmstat.nr_file_pages
225533 ± 24% -33.8% 149239 ± 21% proc-vmstat.nr_inactive_anon
160600 ± 37% -52.0% 77072 ± 46% proc-vmstat.nr_shmem
17820341 -10.9% 15872978 ± 3% proc-vmstat.nr_written
14789 ± 34% -52.4% 7043 ± 48% proc-vmstat.nr_zone_active_anon
225533 ± 24% -33.8% 149239 ± 21% proc-vmstat.nr_zone_inactive_anon
94869 -11.3% 84180 ± 3% proc-vmstat.nr_zone_write_pending
72272965 -11.9% 63654005 ± 4% proc-vmstat.numa_hit
72137253 -11.9% 63522521 ± 4% proc-vmstat.numa_local
1532444 ± 4% -15.8% 1290917 ± 4% proc-vmstat.pgactivate
72444739 -11.9% 63805246 ± 4% proc-vmstat.pgalloc_normal
170396 ± 37% -52.2% 81460 ± 46% proc-vmstat.pgdeactivate
72261159 -11.8% 63709587 ± 4% proc-vmstat.pgfree
84054518 -7.2% 77981052 ± 2% proc-vmstat.pgpgout
2064 +20.2% 2480 ± 7% fxmark.ssd_xfs_dbench_client_54_bufferedio.idle_sec
63.92 +19.5% 76.41 ± 7% fxmark.ssd_xfs_dbench_client_54_bufferedio.idle_util
743.56 ± 2% -42.5% 427.65 ± 33% fxmark.ssd_xfs_dbench_client_54_bufferedio.iowait_sec
23.03 ± 2% -42.8% 13.18 ± 33% fxmark.ssd_xfs_dbench_client_54_bufferedio.iowait_util
41.98 +32.2% 55.50 ± 12% fxmark.ssd_xfs_dbench_client_54_bufferedio.irq_sec
1.30 +31.5% 1.71 ± 12% fxmark.ssd_xfs_dbench_client_54_bufferedio.irq_util
10.09 -17.7% 8.31 ± 11% fxmark.ssd_xfs_dbench_client_54_bufferedio.softirq_sec
0.31 -18.1% 0.26 ± 11% fxmark.ssd_xfs_dbench_client_54_bufferedio.softirq_util
297.40 -26.7% 218.07 ± 19% fxmark.ssd_xfs_dbench_client_54_bufferedio.sys_sec
9.21 -27.0% 6.72 ± 19% fxmark.ssd_xfs_dbench_client_54_bufferedio.sys_util
71.92 -22.2% 55.94 ± 15% fxmark.ssd_xfs_dbench_client_54_bufferedio.user_sec
2.23 -22.6% 1.72 ± 15% fxmark.ssd_xfs_dbench_client_54_bufferedio.user_util
5264 -28.2% 3777 ± 19% fxmark.ssd_xfs_dbench_client_54_bufferedio.works/sec
2971 +19.2% 3543 fxmark.ssd_xfs_dbench_client_72_bufferedio.idle_sec
68.77 +18.6% 81.58 fxmark.ssd_xfs_dbench_client_72_bufferedio.idle_util
910.14 ± 6% -51.0% 446.38 ± 11% fxmark.ssd_xfs_dbench_client_72_bufferedio.iowait_sec
21.06 ± 6% -51.2% 10.28 ± 11% fxmark.ssd_xfs_dbench_client_72_bufferedio.iowait_util
76.95 +31.1% 100.90 ± 2% fxmark.ssd_xfs_dbench_client_72_bufferedio.irq_sec
1.78 +30.4% 2.32 ± 2% fxmark.ssd_xfs_dbench_client_72_bufferedio.irq_util
11.05 -15.6% 9.33 ± 2% fxmark.ssd_xfs_dbench_client_72_bufferedio.softirq_sec
0.26 -16.0% 0.21 ± 2% fxmark.ssd_xfs_dbench_client_72_bufferedio.softirq_util
281.46 ± 2% -31.1% 193.89 ± 5% fxmark.ssd_xfs_dbench_client_72_bufferedio.sys_sec
6.51 ± 2% -31.5% 4.46 ± 5% fxmark.ssd_xfs_dbench_client_72_bufferedio.sys_util
70.03 ± 3% -29.2% 49.56 ± 5% fxmark.ssd_xfs_dbench_client_72_bufferedio.user_sec
1.62 ± 3% -29.6% 1.14 ± 5% fxmark.ssd_xfs_dbench_client_72_bufferedio.user_util
4529 ± 2% -32.0% 3080 ± 5% fxmark.ssd_xfs_dbench_client_72_bufferedio.works/sec
5.884e+08 -12.1% 5.173e+08 ± 4% fxmark.time.file_system_outputs
8226682 -26.2% 6068375 ± 7% fxmark.time.involuntary_context_switches
204845 -1.2% 202350 fxmark.time.minor_page_faults
209.83 -16.0% 176.17 ± 5% fxmark.time.percent_of_cpu_this_job_got
943.47 -17.0% 783.39 ± 5% fxmark.time.system_time
299.18 -11.7% 264.25 ± 3% fxmark.time.user_time
18576211 -16.6% 15496778 ± 5% fxmark.time.voluntary_context_switches
0.01 ± 41% -90.9% 0.00 ±223% perf-sched.sch_delay.avg.ms.__cond_resched.__mutex_lock.constprop.0.__flush_workqueue
0.01 ± 89% +490.2% 0.05 ± 42% perf-sched.sch_delay.avg.ms.__cond_resched.down.xlog_write_iclog.xlog_state_release_iclog.xlog_force_lsn
0.03 ± 54% +97.5% 0.05 ± 30% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.xfs_can_free_eofblocks.xfs_release.__fput
0.01 ± 30% -93.2% 0.00 ±223% perf-sched.sch_delay.max.ms.__cond_resched.__mutex_lock.constprop.0.__flush_workqueue
0.41 ± 18% -49.0% 0.21 ± 56% perf-sched.sch_delay.max.ms.__cond_resched.down_read.xlog_cil_commit.__xfs_trans_commit.xfs_vn_update_time
770542 ± 44% -58.4% 320650 ± 55% perf-sched.total_wait_and_delay.count.ms
0.46 ± 59% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.dput.do_unlinkat.__x64_sys_unlink.do_syscall_64
555.61 ± 19% +28.0% 711.18 ± 8% perf-sched.wait_and_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
177.28 ± 8% +18.7% 210.40 ± 7% perf-sched.wait_and_delay.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
0.67 ± 12% +38.1% 0.93 ± 23% perf-sched.wait_and_delay.avg.ms.io_schedule.folio_wait_bit_common.folio_wait_writeback.__filemap_fdatawait_range
52.59 ± 71% -98.7% 0.70 ±223% perf-sched.wait_and_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_killable.__kthread_create_on_node
1.80 ± 28% +194.2% 5.28 ± 31% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork
0.67 ± 13% -43.0% 0.38 ± 56% perf-sched.wait_and_delay.avg.ms.xlog_force_lsn.xfs_log_force_seq.xfs_file_fsync.__x64_sys_fsync
4396 ± 49% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.dput.do_unlinkat.__x64_sys_unlink.do_syscall_64
18311 ± 52% -68.1% 5841 ± 62% perf-sched.wait_and_delay.count.__cond_resched.dput.path_put.user_statfs.__do_sys_statfs
5076 ± 44% -83.9% 817.33 ±162% perf-sched.wait_and_delay.count.__cond_resched.iomap_write_iter.iomap_file_buffered_write.xfs_file_buffered_write.vfs_write
120.67 ± 3% -37.6% 75.33 ± 71% perf-sched.wait_and_delay.count.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
262.17 ± 39% -51.7% 126.50 ± 51% perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
8.42 ± 33% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.dput.do_unlinkat.__x64_sys_unlink.do_syscall_64
10.54 ± 10% +96.1% 20.67 ± 83% perf-sched.wait_and_delay.max.ms.__cond_resched.dput.path_put.user_statfs.__do_sys_statfs
554.95 ± 87% -92.9% 39.45 ±223% perf-sched.wait_and_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_killable.__kthread_create_on_node
0.43 ± 41% -98.3% 0.01 ±223% perf-sched.wait_time.avg.ms.__cond_resched.__mutex_lock.constprop.0.__flush_workqueue
555.59 ± 19% +28.0% 711.16 ± 8% perf-sched.wait_time.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
177.24 ± 8% +18.7% 210.35 ± 7% perf-sched.wait_time.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
0.33 ± 41% +293.7% 1.28 ± 39% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_common_interrupt
0.56 ± 46% +195.7% 1.65 ± 57% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_call_function_single
0.66 ± 13% +37.6% 0.91 ± 22% perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.folio_wait_writeback.__filemap_fdatawait_range
52.58 ± 71% -95.0% 2.61 ± 91% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_killable.__kthread_create_on_node
1.79 ± 28% +194.6% 5.27 ± 31% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork
7.29 ± 25% +39.4% 10.17 ± 12% perf-sched.wait_time.max.ms.__cond_resched.__alloc_pages.folio_alloc.__filemap_get_folio.iomap_write_begin
1.32 ± 87% -81.7% 0.24 ± 80% perf-sched.wait_time.max.ms.__cond_resched.__filemap_fdatawait_range.file_write_and_wait_range.xfs_file_fsync.__x64_sys_fsync
5.99 ± 31% +69.4% 10.14 ± 10% perf-sched.wait_time.max.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc.kmem_alloc.xfs_dir_lookup
0.73 ± 66% -99.0% 0.01 ±223% perf-sched.wait_time.max.ms.__cond_resched.__mutex_lock.constprop.0.__flush_workqueue
8.41 ± 9% +36.1% 11.44 ± 7% perf-sched.wait_time.max.ms.__cond_resched.down.xfs_buf_lock.xfs_buf_find_lock.xfs_buf_lookup
3.42 ± 46% -72.2% 0.95 ± 80% perf-sched.wait_time.max.ms.__cond_resched.down_read.xfs_file_fsync.__x64_sys_fsync.do_syscall_64
6.45 ± 39% +55.2% 10.00 ± 10% perf-sched.wait_time.max.ms.__cond_resched.down_read.xfs_ilock.xfs_file_buffered_read.xfs_file_read_iter
8.75 ± 20% +37.4% 12.02 ± 8% perf-sched.wait_time.max.ms.__cond_resched.down_read.xfs_ilock_data_map_shared.xfs_dir_open.do_dentry_open
7.50 ± 11% +39.9% 10.49 ± 11% perf-sched.wait_time.max.ms.__cond_resched.down_read.xlog_cil_commit.__xfs_trans_commit.xfs_create
3.28 ± 79% -67.8% 1.06 ± 29% perf-sched.wait_time.max.ms.__cond_resched.down_read.xlog_cil_commit.__xfs_trans_commit.xfs_iomap_write_unwritten
6.70 ± 28% +56.6% 10.49 ± 9% perf-sched.wait_time.max.ms.__cond_resched.down_write.xfs_ilock.xfs_ilock_for_iomap.xfs_buffered_write_iomap_begin
8.38 ± 34% +56.7% 13.13 ± 19% perf-sched.wait_time.max.ms.__cond_resched.dput.do_unlinkat.__x64_sys_unlink.do_syscall_64
9.59 ± 14% +33.7% 12.82 ± 4% perf-sched.wait_time.max.ms.__cond_resched.dput.open_last_lookups.path_openat.do_filp_open
9.16 ± 23% +29.4% 11.86 ± 8% perf-sched.wait_time.max.ms.__cond_resched.dput.path_put.vfs_statx.vfs_fstatat
8.25 ± 25% +41.5% 11.68 ± 7% perf-sched.wait_time.max.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
7.87 ± 28% +45.3% 11.43 ± 9% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open
8.36 ± 28% +43.2% 11.97 ± 15% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0
8.65 ± 11% +37.6% 11.90 ± 2% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
7.41 ± 32% +45.8% 10.81 ± 13% perf-sched.wait_time.max.ms.__cond_resched.truncate_inode_pages_range.evict.do_unlinkat.__x64_sys_unlink




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki