Re: [PATCH 1/2] lib/find: Make functions safe on changing bitmaps

From: kernel test robot
Date: Wed Oct 25 2023 - 03:19:41 EST




Hello,

kernel test robot noticed a 3.7% improvement of will-it-scale.per_thread_ops on:


commit: df671b17195cd6526e029c70d04dfb72561082d7 ("[PATCH 1/2] lib/find: Make functions safe on changing bitmaps")
url: https://github.com/intel-lab-lkp/linux/commits/Jan-Kara/lib-find-Make-functions-safe-on-changing-bitmaps/20231011-230553
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 1c8b86a3799f7e5be903c3f49fcdaee29fd385b5
patch link: https://lore.kernel.org/all/20231011150252.32737-1-jack@xxxxxxx/
patch subject: [PATCH 1/2] lib/find: Make functions safe on changing bitmaps

testcase: will-it-scale
test machine: 104 threads 2 sockets (Skylake) with 192G memory
parameters:

nr_task: 50%
mode: thread
test: tlb_flush3
cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231025/202310251458.48b4452d-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/thread/50%/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/tlb_flush3/will-it-scale

commit:
1c8b86a379 ("Merge tag 'xsa441-6.6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip")
df671b1719 ("lib/find: Make functions safe on changing bitmaps")

1c8b86a3799f7e5b df671b17195cd6526e029c70d04
---------------- ---------------------------
%stddev %change %stddev
\ | \
0.14 ± 19% +36.9% 0.19 ± 17% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
2.26e+08 +3.6% 2.343e+08 proc-vmstat.pgfault
0.04 +25.0% 0.05 turbostat.IPC
32666 -15.5% 27605 ± 2% turbostat.POLL
7856 +2.2% 8025 vmstat.system.cs
6331931 +2.3% 6478704 vmstat.system.in
700119 +3.7% 725931 will-it-scale.52.threads
13463 +3.7% 13959 will-it-scale.per_thread_ops
700119 +3.7% 725931 will-it-scale.workload
8.36 -7.3% 7.74 perf-stat.i.MPKI
4.591e+09 +3.4% 4.747e+09 perf-stat.i.branch-instructions
1.832e+08 +2.8% 1.883e+08 perf-stat.i.branch-misses
26.70 -0.3 26.40 perf-stat.i.cache-miss-rate%
7852 +2.2% 8021 perf-stat.i.context-switches
6.43 -7.2% 5.97 perf-stat.i.cpi
769.61 +1.8% 783.29 perf-stat.i.cpu-migrations
6.39e+09 +3.4% 6.606e+09 perf-stat.i.dTLB-loads
2.94e+09 +3.2% 3.035e+09 perf-stat.i.dTLB-stores
78.29 -0.9 77.44 perf-stat.i.iTLB-load-miss-rate%
18959450 +3.5% 19621273 perf-stat.i.iTLB-load-misses
5254435 +8.7% 5713444 perf-stat.i.iTLB-loads
2.236e+10 +7.7% 2.408e+10 perf-stat.i.instructions
1181 +4.0% 1228 perf-stat.i.instructions-per-iTLB-miss
0.16 +7.7% 0.17 perf-stat.i.ipc
0.02 ± 36% -49.6% 0.01 ± 53% perf-stat.i.major-faults
485.08 +3.0% 499.67 perf-stat.i.metric.K/sec
141.71 +3.2% 146.25 perf-stat.i.metric.M/sec
747997 +3.7% 775416 perf-stat.i.minor-faults
3127957 -13.9% 2693728 perf-stat.i.node-loads
26089697 +3.4% 26965335 perf-stat.i.node-store-misses
767569 +3.7% 796095 perf-stat.i.node-stores
747997 +3.7% 775416 perf-stat.i.page-faults
8.35 -7.3% 7.74 perf-stat.overall.MPKI
26.70 -0.3 26.40 perf-stat.overall.cache-miss-rate%
6.43 -7.1% 5.97 perf-stat.overall.cpi
78.30 -0.9 77.45 perf-stat.overall.iTLB-load-miss-rate%
1179 +4.0% 1226 perf-stat.overall.instructions-per-iTLB-miss
0.16 +7.7% 0.17 perf-stat.overall.ipc
9644584 +3.8% 10011125 perf-stat.overall.path-length
4.575e+09 +3.4% 4.731e+09 perf-stat.ps.branch-instructions
1.825e+08 +2.8% 1.876e+08 perf-stat.ps.branch-misses
7825 +2.2% 7995 perf-stat.ps.context-switches
767.16 +1.8% 780.76 perf-stat.ps.cpu-migrations
6.368e+09 +3.4% 6.583e+09 perf-stat.ps.dTLB-loads
2.93e+09 +3.2% 3.025e+09 perf-stat.ps.dTLB-stores
18896725 +3.5% 19555325 perf-stat.ps.iTLB-load-misses
5236456 +8.7% 5693636 perf-stat.ps.iTLB-loads
2.229e+10 +7.6% 2.399e+10 perf-stat.ps.instructions
745423 +3.7% 772705 perf-stat.ps.minor-faults
3117663 -13.9% 2684861 perf-stat.ps.node-loads
26002765 +3.4% 26875267 perf-stat.ps.node-store-misses
764789 +3.7% 793098 perf-stat.ps.node-stores
745423 +3.7% 772705 perf-stat.ps.page-faults
6.752e+12 +7.6% 7.267e+12 perf-stat.total.instructions
19.21 -1.0 18.18 perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
17.00 -0.9 16.09 perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range
65.30 -0.6 64.69 perf-profile.calltrace.cycles-pp.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64
65.34 -0.6 64.75 perf-profile.calltrace.cycles-pp.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe
65.98 -0.5 65.45 perf-profile.calltrace.cycles-pp.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
65.96 -0.5 65.42 perf-profile.calltrace.cycles-pp.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
9.72 ± 2% -0.5 9.20 perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range
66.33 -0.5 65.81 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
66.46 -0.5 65.95 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__madvise
31.88 -0.4 31.43 perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single
67.72 -0.4 67.28 perf-profile.calltrace.cycles-pp.__madvise
32.15 -0.4 31.73 perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
32.60 -0.4 32.21 perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise
32.93 -0.3 32.58 perf-profile.calltrace.cycles-pp.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
31.07 -0.3 30.74 perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single
31.58 -0.3 31.28 perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior
31.61 -0.3 31.30 perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise
31.80 -0.3 31.51 perf-profile.calltrace.cycles-pp.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
8.34 -0.1 8.22 perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask
8.06 -0.1 7.95 perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond
7.98 -0.1 7.87 perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.llist_add_batch
0.59 ± 3% +0.1 0.65 ± 2% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.testcase
1.46 +0.1 1.53 perf-profile.calltrace.cycles-pp.filemap_map_pages.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault
1.48 +0.1 1.55 perf-profile.calltrace.cycles-pp.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
1.53 +0.1 1.62 perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
2.92 +0.1 3.02 perf-profile.calltrace.cycles-pp.flush_tlb_func.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
1.26 ± 2% +0.1 1.36 perf-profile.calltrace.cycles-pp.default_send_IPI_mask_sequence_phys.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range
1.84 +0.1 1.96 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
7.87 +0.1 8.00 perf-profile.calltrace.cycles-pp.llist_reverse_order.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
2.03 ± 2% +0.1 2.17 perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
2.90 +0.2 3.06 perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range
2.62 ± 3% +0.2 2.80 perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase
2.58 ± 3% +0.2 2.76 perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
2.95 ± 3% +0.2 3.14 perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase
2.75 +0.2 2.94 perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
4.96 +0.3 5.29 perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask
4.92 +0.3 5.25 perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond
5.13 +0.3 5.46 perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range
5.08 +0.4 5.44 perf-profile.calltrace.cycles-pp.testcase
37.25 -2.0 35.24 perf-profile.children.cycles-pp.llist_add_batch
62.82 -0.8 62.04 perf-profile.children.cycles-pp.on_each_cpu_cond_mask
62.82 -0.8 62.04 perf-profile.children.cycles-pp.smp_call_function_many_cond
63.70 -0.7 62.98 perf-profile.children.cycles-pp.flush_tlb_mm_range
65.30 -0.6 64.70 perf-profile.children.cycles-pp.zap_page_range_single
65.34 -0.6 64.75 perf-profile.children.cycles-pp.madvise_vma_behavior
65.98 -0.5 65.45 perf-profile.children.cycles-pp.__x64_sys_madvise
65.96 -0.5 65.43 perf-profile.children.cycles-pp.do_madvise
66.52 -0.5 66.01 perf-profile.children.cycles-pp.do_syscall_64
66.65 -0.5 66.16 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
67.79 -0.4 67.36 perf-profile.children.cycles-pp.__madvise
32.94 -0.3 32.60 perf-profile.children.cycles-pp.tlb_finish_mmu
31.74 -0.3 31.43 perf-profile.children.cycles-pp.zap_pte_range
31.76 -0.3 31.46 perf-profile.children.cycles-pp.zap_pmd_range
31.95 -0.3 31.66 perf-profile.children.cycles-pp.unmap_page_range
0.42 ± 2% +0.0 0.46 perf-profile.children.cycles-pp.error_entry
0.20 ± 3% +0.0 0.24 ± 5% perf-profile.children.cycles-pp.up_read
0.69 +0.0 0.74 perf-profile.children.cycles-pp.native_flush_tlb_local
1.47 +0.1 1.55 perf-profile.children.cycles-pp.filemap_map_pages
1.48 +0.1 1.56 perf-profile.children.cycles-pp.do_read_fault
1.54 +0.1 1.62 perf-profile.children.cycles-pp.do_fault
2.75 +0.1 2.86 perf-profile.children.cycles-pp.default_send_IPI_mask_sequence_phys
1.85 +0.1 1.98 perf-profile.children.cycles-pp.__handle_mm_fault
2.04 ± 2% +0.1 2.18 perf-profile.children.cycles-pp.handle_mm_fault
2.63 ± 3% +0.2 2.81 perf-profile.children.cycles-pp.exc_page_fault
2.62 ± 3% +0.2 2.80 perf-profile.children.cycles-pp.do_user_addr_fault
3.24 ± 3% +0.2 3.44 perf-profile.children.cycles-pp.asm_exc_page_fault
3.83 +0.2 4.04 perf-profile.children.cycles-pp.flush_tlb_func
0.69 ± 2% +0.2 0.92 perf-profile.children.cycles-pp._find_next_bit
9.92 +0.3 10.23 perf-profile.children.cycles-pp.llist_reverse_order
5.45 +0.4 5.81 perf-profile.children.cycles-pp.testcase
18.42 +0.5 18.96 perf-profile.children.cycles-pp.asm_sysvec_call_function
16.24 +0.5 16.78 perf-profile.children.cycles-pp.__flush_smp_call_function_queue
15.78 +0.5 16.32 perf-profile.children.cycles-pp.__sysvec_call_function
16.36 +0.5 16.90 perf-profile.children.cycles-pp.sysvec_call_function
27.92 -1.9 26.04 perf-profile.self.cycles-pp.llist_add_batch
0.16 ± 2% +0.0 0.18 ± 4% perf-profile.self.cycles-pp.up_read
0.42 ± 2% +0.0 0.45 perf-profile.self.cycles-pp.error_entry
0.21 ± 4% +0.0 0.24 ± 5% perf-profile.self.cycles-pp.down_read
0.26 ± 2% +0.0 0.29 ± 3% perf-profile.self.cycles-pp.tlb_finish_mmu
2.01 +0.0 2.05 perf-profile.self.cycles-pp.default_send_IPI_mask_sequence_phys
0.68 +0.0 0.73 perf-profile.self.cycles-pp.native_flush_tlb_local
3.10 +0.2 3.26 perf-profile.self.cycles-pp.flush_tlb_func
0.50 ± 2% +0.2 0.68 perf-profile.self.cycles-pp._find_next_bit
9.92 +0.3 10.22 perf-profile.self.cycles-pp.llist_reverse_order
16.10 +0.5 16.64 perf-profile.self.cycles-pp.smp_call_function_many_cond




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki