[linus:master] [perf test] e2cb1db7da: perf-sanity-tests.perf_all_metrics_test.fail

From: kernel test robot
Date: Fri Dec 06 2024 - 01:38:48 EST




Hello,

kernel test robot noticed "perf-sanity-tests.perf_all_metrics_test.fail" on:

commit: e2cb1db7daf8b7863aeec07bb574d3fae54518e6 ("perf test: Update all metrics test like metricgroups test")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[test failed on linus/master bcc8eda6d34934d80b96adb8dc4ff5dfc632a53a]
[test failed on linux-next/master f486c8aa16b8172f63bddc70116a0c897a7f3f02]

in testcase: perf-sanity-tests
version:
with following parameters:

perf_compiler: clang



config: x86_64-rhel-9.4-bpf
compiler: gcc-12
test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz (Kaby Lake) with 32G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
| Closes: https://lore.kernel.org/oe-lkp/202412061343.42b071db-lkp@xxxxxxxxx



2024-12-06 05:35:29 sudo /usr/src/linux-perf-x86_64-rhel-9.4-bpf-e2cb1db7daf8b7863aeec07bb574d3fae54518e6/tools/perf/perf test 109 -v
109: perf all metrics test:
--- start ---
test child forked, pid 14671
Testing tma_core_bound
Testing tma_info_core_ilp
Testing tma_info_memory_l2mpki
Testing tma_memory_bound
Testing tma_info_bad_spec_branch_misprediction_cost
Testing tma_info_bad_spec_ipmisp_indirect
Testing tma_info_bad_spec_ipmispredict
Testing tma_info_bottleneck_irregular_overhead
Testing tma_info_bottleneck_mispredictions
Testing tma_info_branches_callret
Testing tma_info_branches_cond_nt
Testing tma_info_branches_cond_tk
Testing tma_info_branches_jump
Testing tma_branch_mispredicts
Testing tma_clears_resteers
Testing tma_machine_clears
Testing tma_mispredicts_resteers
Testing tma_icache_misses
Testing tma_info_bottleneck_big_code
Testing tma_itlb_misses
Metric 'tma_itlb_misses' not printed in:
# Running 'internals/synthesize' benchmark:
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
Average synthesis took: 664.527 usec (+- 0.249 usec)
Average num. events: 45.000 (+- 0.000)
Average time per event 14.767 usec
Average data synthesis took: 712.865 usec (+- 0.236 usec)
Average num. events: 330.000 (+- 0.000)
Average time per event 2.160 usec

Performance counter stats for 'system wide':

<not counted> CPU_CLK_UNHALTED.THREAD_ANY (0.00%)
<not counted> IDQ_UOPS_NOT_DELIVERED.CORE (0.00%)
<not counted> CPU_CLK_UNHALTED.THREAD (0.00%)
<not counted> IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE (0.00%)
<not counted> ICACHE_TAG.STALLS (0.00%)

14.342315124 seconds time elapsed

Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
Testing tma_unknown_branches
Testing tma_info_bad_spec_spec_clears_ratio
Testing tma_other_mispredicts
Testing tma_fused_instructions
Testing tma_info_inst_mix_bptkbranch
Testing tma_info_inst_mix_ipbranch
Testing tma_info_inst_mix_ipcall
Testing tma_info_inst_mix_iptb
Testing tma_info_system_ipfarbranch
Testing tma_info_thread_uptb
Testing tma_non_fused_branches
Testing tma_info_bottleneck_branching_overhead
Testing tma_nop_instructions
Testing tma_divider
Testing tma_info_bottleneck_compute_bound_est
Testing tma_ports_utilized_3m
Testing tma_frontend_bound
Testing tma_info_bottleneck_instruction_fetch_bw
Testing tma_assists
Testing tma_other_nukes
Testing tma_serializing_operation
Testing tma_info_bottleneck_cache_memory_bandwidth
Testing tma_info_bottleneck_cache_memory_latency
Testing tma_l1_hit_latency
Testing tma_l2_bound
Testing tma_l3_hit_latency
Testing tma_mem_latency
Testing tma_store_latency
Testing tma_contested_accesses
Testing tma_data_sharing
Testing tma_false_sharing
Testing tma_fb_full
Testing tma_info_bottleneck_memory_synchronization
Testing tma_mem_bandwidth
Testing tma_sq_full
Testing tma_dtlb_load
Testing tma_dtlb_store
Testing tma_info_bottleneck_memory_data_tlbs
Testing tma_backend_bound
Testing tma_info_bottleneck_other_bottlenecks
Testing tma_info_bottleneck_useful_work
Testing tma_retiring
Testing tma_info_memory_fb_hpki
Testing tma_info_memory_l1mpki
Testing tma_info_memory_l1mpki_load
Testing tma_info_memory_l2hpki_all
Testing tma_info_memory_l2hpki_load
Testing tma_info_memory_l2mpki_all
Testing tma_info_memory_l2mpki_load
Testing tma_l1_bound
Testing tma_l3_bound
Testing tma_info_memory_l2mpki_rfo
Testing tma_fp_scalar
Testing tma_fp_vector
Testing tma_fp_vector_128b
Testing tma_fp_vector_256b
Testing tma_port_0
Testing tma_x87_use
Testing tma_info_botlnk_l0_core_bound_likely
Testing tma_info_core_fp_arith_utilization
Testing tma_info_pipeline_execute
Testing tma_info_system_gflops
Testing tma_info_thread_execute_per_issue
Testing tma_dsb
Metric 'tma_dsb' not printed in:
# Running 'internals/synthesize' benchmark:
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
Average synthesis took: 661.175 usec (+- 0.249 usec)
Average num. events: 45.000 (+- 0.000)
Average time per event 14.693 usec
Average data synthesis took: 709.301 usec (+- 0.237 usec)
Average num. events: 329.000 (+- 0.000)
Average time per event 2.156 usec

Performance counter stats for 'system wide':

<not counted> CPU_CLK_UNHALTED.THREAD_ANY (0.00%)
<not counted> IDQ.DSB_CYCLES_OK (0.00%)
<not counted> IDQ_UOPS_NOT_DELIVERED.CORE (0.00%)
<not counted> IDQ.DSB_CYCLES_ANY (0.00%)
<not counted> IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE (0.00%)

14.272249202 seconds time elapsed

Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
Testing tma_info_botlnk_l2_dsb_bandwidth
Testing tma_info_frontend_dsb_coverage
Testing tma_decoder0_alone
Testing tma_dsb_switches
Metric 'tma_dsb_switches' not printed in:
# Running 'internals/synthesize' benchmark:
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
Average synthesis took: 666.217 usec (+- 0.248 usec)
Average num. events: 45.000 (+- 0.000)
Average time per event 14.805 usec
Average data synthesis took: 715.290 usec (+- 0.235 usec)
Average num. events: 329.000 (+- 0.000)
Average time per event 2.174 usec

Performance counter stats for 'system wide':

<not counted> DSB2MITE_SWITCHES.PENALTY_CYCLES (0.00%)
<not counted> CPU_CLK_UNHALTED.THREAD_ANY (0.00%)
<not counted> IDQ_UOPS_NOT_DELIVERED.CORE (0.00%)
<not counted> CPU_CLK_UNHALTED.THREAD (0.00%)
<not counted> IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE (0.00%)

14.383362279 seconds time elapsed

Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
Testing tma_info_botlnk_l2_dsb_misses
Testing tma_info_frontend_dsb_switch_cost
Testing tma_info_frontend_ipdsb_miss_ret
Testing tma_mite
Metric 'tma_mite' not printed in:
# Running 'internals/synthesize' benchmark:
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
Average synthesis took: 664.894 usec (+- 0.246 usec)
Average num. events: 45.000 (+- 0.000)
Average time per event 14.775 usec
Average data synthesis took: 713.947 usec (+- 0.237 usec)
Average num. events: 329.000 (+- 0.000)
Average time per event 2.170 usec

Performance counter stats for 'system wide':

<not counted> CPU_CLK_UNHALTED.THREAD_ANY (0.00%)
<not counted> IDQ.ALL_MITE_CYCLES_ANY_UOPS (0.00%)
<not counted> IDQ_UOPS_NOT_DELIVERED.CORE (0.00%)
<not counted> IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE (0.00%)
<not counted> IDQ.ALL_MITE_CYCLES_4_UOPS (0.00%)

14.357059784 seconds time elapsed

Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
Testing tma_info_botlnk_l2_ic_misses
Testing tma_info_frontend_fetch_upc
Testing tma_info_frontend_icache_miss_latency
Testing tma_info_frontend_ipunknown_branch
Testing tma_info_memory_tlb_code_stlb_mpki
Testing tma_info_pipeline_fetch_dsb
Testing tma_info_pipeline_fetch_mite
Testing tma_fetch_bandwidth
Testing tma_branch_resteers
Testing tma_lcp
Metric 'tma_lcp' not printed in:
# Running 'internals/synthesize' benchmark:
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
Average synthesis took: 660.287 usec (+- 0.244 usec)
Average num. events: 45.000 (+- 0.000)
Average time per event 14.673 usec
Average data synthesis took: 708.815 usec (+- 0.234 usec)
Average num. events: 329.000 (+- 0.000)
Average time per event 2.154 usec

Performance counter stats for 'system wide':

<not counted> CPU_CLK_UNHALTED.THREAD_ANY (0.00%)
<not counted> DECODE.LCP (0.00%)
<not counted> IDQ_UOPS_NOT_DELIVERED.CORE (0.00%)
<not counted> CPU_CLK_UNHALTED.THREAD (0.00%)
<not counted> IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE (0.00%)

14.258487831 seconds time elapsed

Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
Testing tma_ms_switches
Metric 'tma_ms_switches' not printed in:
# Running 'internals/synthesize' benchmark:
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
Average synthesis took: 667.896 usec (+- 0.249 usec)
Average num. events: 45.000 (+- 0.000)
Average time per event 14.842 usec
Average data synthesis took: 717.140 usec (+- 0.240 usec)
Average num. events: 329.000 (+- 0.000)
Average time per event 2.180 usec

Performance counter stats for 'system wide':

<not counted> CPU_CLK_UNHALTED.THREAD_ANY (0.00%)
<not counted> IDQ_UOPS_NOT_DELIVERED.CORE (0.00%)
<not counted> CPU_CLK_UNHALTED.THREAD (0.00%)
<not counted> IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE (0.00%)
<not counted> IDQ.MS_SWITCHES (0.00%)

14.420115976 seconds time elapsed

Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
Testing tma_info_core_flopc
Testing tma_info_inst_mix_iparith
Testing tma_info_inst_mix_iparith_avx128
Testing tma_info_inst_mix_iparith_avx256
Testing tma_info_inst_mix_iparith_scalar_dp
Testing tma_info_inst_mix_iparith_scalar_sp
Testing tma_info_inst_mix_ipflop
Testing tma_fetch_latency
Testing tma_fp_arith
Testing tma_fp_assists
Testing tma_info_system_cpu_utilization
Testing tma_info_system_dram_bw_use
Testing tma_info_frontend_l2mpki_code
Testing tma_info_frontend_l2mpki_code_all
Testing tma_info_inst_mix_ipload
Testing tma_info_inst_mix_ipstore
Testing tma_info_memory_core_l1d_cache_fill_bw_2t
Testing tma_info_memory_core_l2_cache_fill_bw_2t
Testing tma_info_memory_core_l3_cache_access_bw_2t
Testing tma_info_memory_core_l3_cache_fill_bw_2t
Testing tma_info_memory_l1d_cache_fill_bw
Testing tma_info_memory_l2_cache_fill_bw
Testing tma_info_memory_l3_cache_access_bw
Testing tma_info_memory_l3_cache_fill_bw
Testing tma_info_memory_l3mpki
Testing tma_info_memory_load_miss_real_latency
Testing tma_info_memory_mix_uc_load_pki
Testing tma_info_memory_mlp
Testing tma_info_memory_tlb_load_stlb_mpki
Testing tma_info_memory_tlb_page_walks_utilization
Testing tma_info_memory_tlb_store_stlb_mpki
Testing tma_info_system_mem_parallel_reads
Testing tma_info_system_mem_read_latency
Testing tma_info_thread_cpi
Testing tma_dram_bound
Testing tma_store_bound
Testing tma_load_stlb_hit
Testing tma_load_stlb_miss
Testing tma_store_stlb_hit
Testing tma_store_stlb_miss
Testing tma_info_memory_latency_data_l2_mlp
Testing tma_info_memory_latency_load_l2_mlp
Testing tma_info_memory_latency_load_l2_miss_latency
Testing tma_info_pipeline_ipassist
Testing tma_microcode_sequencer
Testing tma_info_system_kernel_cpi
Testing tma_info_system_kernel_utilization
Testing tma_lock_latency
Testing tma_info_pipeline_retire
Testing tma_info_thread_clks
Testing tma_info_thread_uoppi
Testing tma_memory_operations
Testing tma_other_light_ops
Testing tma_ports_utilization
Testing tma_ports_utilized_0
Testing tma_ports_utilized_1
Testing tma_ports_utilized_2
Testing C2_Pkg_Residency
Testing C3_Core_Residency
Testing C3_Pkg_Residency
Testing C6_Core_Residency
Testing C6_Pkg_Residency
Testing C7_Core_Residency
Testing C7_Pkg_Residency
Testing tma_info_core_epc
Testing tma_info_system_core_frequency
Testing tma_info_system_turbo_utilization
Testing tma_info_inst_mix_ipswpf
Testing tma_info_core_coreipc
Testing tma_info_thread_ipc
Testing tma_heavy_operations
Testing tma_light_operations
Testing tma_info_core_core_clks
Testing tma_info_system_smt_2t_utilization
Testing UNCORE_FREQ
Testing tma_info_system_socket_clks
Testing tma_info_inst_mix_instructions
Testing tma_info_system_cpus_utilized
Testing tma_bad_speculation
Testing tma_info_thread_slots
Testing tma_few_uops_instructions
Testing tma_4k_aliasing
Testing tma_cisc
Testing tma_split_loads
Testing tma_split_stores
Testing tma_store_fwd_blk
Testing tma_alu_op_utilization
Metric 'tma_alu_op_utilization' not printed in:
# Running 'internals/synthesize' benchmark:
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
Average synthesis took: 665.830 usec (+- 0.241 usec)
Average num. events: 45.000 (+- 0.000)
Average time per event 14.796 usec
Average data synthesis took: 713.746 usec (+- 0.234 usec)
Average num. events: 328.000 (+- 0.000)
Average time per event 2.176 usec

Performance counter stats for 'system wide':

<not counted> CPU_CLK_UNHALTED.THREAD_ANY (0.00%)
<not counted> UOPS_DISPATCHED_PORT.PORT_6 (0.00%)
<not counted> UOPS_DISPATCHED_PORT.PORT_0 (0.00%)
<not counted> UOPS_DISPATCHED_PORT.PORT_5 (0.00%)
<not counted> UOPS_DISPATCHED_PORT.PORT_1 (0.00%)

14.364344905 seconds time elapsed

Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
Testing tma_load_op_utilization
Metric 'tma_load_op_utilization' not printed in:
# Running 'internals/synthesize' benchmark:
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
Average synthesis took: 667.283 usec (+- 0.248 usec)
Average num. events: 45.000 (+- 0.000)
Average time per event 14.829 usec
Average data synthesis took: 714.417 usec (+- 0.231 usec)
Average num. events: 328.000 (+- 0.000)
Average time per event 2.178 usec

Performance counter stats for 'system wide':

<not counted> CPU_CLK_UNHALTED.THREAD_ANY (0.00%)
<not counted> UOPS_DISPATCHED_PORT.PORT_3 (0.00%)
<not counted> UOPS_DISPATCHED_PORT.PORT_2 (0.00%)
<not counted> UOPS_DISPATCHED_PORT.PORT_7 (0.00%)
<not counted> UOPS_DISPATCHED_PORT.PORT_4 (0.00%)

14.385195742 seconds time elapsed

Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
Testing tma_mixing_vectors
Testing tma_store_op_utilization
Testing tma_port_1
Testing tma_port_2
Testing tma_port_3
Testing tma_port_4
Testing tma_port_5
Testing tma_port_6
Testing tma_port_7
Testing smi_cycles
Testing smi_num
Testing tsx_aborted_cycles
Testing tsx_cycles_per_elision
Testing tsx_cycles_per_transaction
Testing tsx_transactional_cycles
---- end(-1) ----
109: perf all metrics test : FAILED!



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20241206/202412061343.42b071db-lkp@xxxxxxxxx


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki