Re: [RFC PATCH v2 6/6] perf vendor events intel: Add MTL metric json files

From: Ian Rogers
Date: Wed Feb 28 2024 - 11:50:34 EST


On Wed, Feb 28, 2024 at 8:12 AM <weilin.wang@xxxxxxxxx> wrote:
>
> From: Weilin Wang <weilin.wang@xxxxxxxxx>
>
> Add MTL metric json file at TMA4.7 [1]. Some of the metrics' formulas use TPEBS
> retire_latency in MTL.
>
> [1] https://lore.kernel.org/all/20240214011820.644458-1-irogers@xxxxxxxxxx/
>
> Signed-off-by: Weilin Wang <weilin.wang@xxxxxxxxx>
> ---
> .../arch/x86/meteorlake/metricgroups.json | 127 +
> .../arch/x86/meteorlake/mtl-metrics.json | 2531 +++++++++++++++++
> 2 files changed, 2658 insertions(+)
> create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/metricgroups.json
> create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/mtl-metricsjson
>
> diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/metricgroups.json b/tools/perf/pmu-events/arch/x86/meteorlake/metricgroups.json
> new file mode 100644
> index 000000000000..7a03835f262c
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/x86/meteorlake/metricgroups.json
> @@ -0,0 +1,127 @@
> +{
> + "Backend": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "Bad": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "BadSpec": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "BigFootprint": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "BrMispredicts": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "Branches": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "C0Wait": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "CacheHits": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "CodeGen": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "Compute": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "Cor": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "DSB": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "DSBmiss": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "DataSharing": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "Fed": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "FetchBW": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "FetchLat": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "Flops": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "FpScalar": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "FpVector": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "Frontend": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "HPC": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "IcMiss": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "InsType": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "IntVector": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "L2Evicts": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "LSD": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "MachineClears": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "Machine_Clears": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "Mem": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "MemOffcore": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "MemoryBW": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "MemoryBound": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "MemoryLat": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "MemoryTLB": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "Memory_BW": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "Memory_Lat": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "MicroSeq": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "OS": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "Offcore": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "PGO": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "Pipeline": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "PortsUtil": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "Power": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "Prefetches": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "Ret": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "Retire": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "SMT": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "Server": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "Snoop": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "SoC": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "Summary": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "TmaL1": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "TmaL2": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "TmaL3mem": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
> + "TopdownL1": "Metrics for top-down breakdown at level 1",
> + "TopdownL2": "Metrics for top-down breakdown at level 2",
> + "TopdownL3": "Metrics for top-down breakdown at level 3",
> + "TopdownL4": "Metrics for top-down breakdown at level 4",
> + "TopdownL5": "Metrics for top-down breakdown at level 5",
> + "TopdownL6": "Metrics for top-down breakdown at level 6",
> + "tma_L1_group": "Metrics for top-down breakdown at level 1",
> + "tma_L2_group": "Metrics for top-down breakdown at level 2",
> + "tma_L3_group": "Metrics for top-down breakdown at level 3",
> + "tma_L4_group": "Metrics for top-down breakdown at level 4",
> + "tma_L5_group": "Metrics for top-down breakdown at level 5",
> + "tma_L6_group": "Metrics for top-down breakdown at level 6",
> + "tma_alu_op_utilization_group": "Metrics contributing to tma_alu_op_utilization category",
> + "tma_assists_group": "Metrics contributing to tma_assists category",
> + "tma_backend_bound_aux_group": "Metrics contributing to tma_backend_bound_aux category",
> + "tma_backend_bound_group": "Metrics contributing to tma_backend_bound category",
> + "tma_bad_speculation_group": "Metrics contributing to tma_bad_speculation category",
> + "tma_base_group": "Metrics contributing to tma_base category",
> + "tma_branch_mispredicts_group": "Metrics contributing to tma_branch_mispredicts category",
> + "tma_branch_resteers_group": "Metrics contributing to tma_branch_resteers category",
> + "tma_core_bound_group": "Metrics contributing to tma_core_bound category",
> + "tma_dram_bound_group": "Metrics contributing to tma_dram_bound category",
> + "tma_dtlb_load_group": "Metrics contributing to tma_dtlb_load category",
> + "tma_dtlb_store_group": "Metrics contributing to tma_dtlb_store category",
> + "tma_fetch_bandwidth_group": "Metrics contributing to tma_fetch_bandwidth category",
> + "tma_fetch_latency_group": "Metrics contributing to tma_fetch_latency category",
> + "tma_fp_arith_group": "Metrics contributing to tma_fp_arith category",
> + "tma_fp_vector_group": "Metrics contributing to tma_fp_vector category",
> + "tma_frontend_bound_group": "Metrics contributing to tma_frontend_bound category",
> + "tma_heavy_operations_group": "Metrics contributing to tma_heavy_operations category",
> + "tma_int_operations_group": "Metrics contributing to tma_int_operations category",
> + "tma_issue2P": "Metrics related by the issue $issue2P",
> + "tma_issueBM": "Metrics related by the issue $issueBM",
> + "tma_issueBW": "Metrics related by the issue $issueBW",
> + "tma_issueComp": "Metrics related by the issue $issueComp",
> + "tma_issueD0": "Metrics related by the issue $issueD0",
> + "tma_issueFB": "Metrics related by the issue $issueFB",
> + "tma_issueFL": "Metrics related by the issue $issueFL",
> + "tma_issueL1": "Metrics related by the issue $issueL1",
> + "tma_issueLat": "Metrics related by the issue $issueLat",
> + "tma_issueMC": "Metrics related by the issue $issueMC",
> + "tma_issueMS": "Metrics related by the issue $issueMS",
> + "tma_issueMV": "Metrics related by the issue $issueMV",
> + "tma_issueRFO": "Metrics related by the issue $issueRFO",
> + "tma_issueSL": "Metrics related by the issue $issueSL",
> + "tma_issueSO": "Metrics related by the issue $issueSO",
> + "tma_issueSmSt": "Metrics related by the issue $issueSmSt",
> + "tma_issueSpSt": "Metrics related by the issue $issueSpSt",
> + "tma_issueSyncxn": "Metrics related by the issue $issueSyncxn",
> + "tma_issueTLB": "Metrics related by the issue $issueTLB",
> + "tma_l1_bound_group": "Metrics contributing to tma_l1_bound category",
> + "tma_l3_bound_group": "Metrics contributing to tma_l3_bound category",
> + "tma_light_operations_group": "Metrics contributing to tma_light_operations category",
> + "tma_load_op_utilization_group": "Metrics contributing to tma_load_op_utilization category",
> + "tma_machine_clears_group": "Metrics contributing to tma_machine_clears category",
> + "tma_mem_latency_group": "Metrics contributing to tma_mem_latency category",
> + "tma_mem_scheduler_group": "Metrics contributing to tma_mem_scheduler category",
> + "tma_memory_bound_group": "Metrics contributing to tma_memory_bound category",
> + "tma_microcode_sequencer_group": "Metrics contributing to tma_microcode_sequencer category",
> + "tma_mite_group": "Metrics contributing to tma_mite category",
> + "tma_nuke_group": "Metrics contributing to tma_nuke category",
> + "tma_other_light_ops_group": "Metrics contributing to tma_other_light_ops category",
> + "tma_ports_utilization_group": "Metrics contributing to tma_ports_utilization category",
> + "tma_ports_utilized_0_group": "Metrics contributing to tma_ports_utilized_0 category",
> + "tma_ports_utilized_3m_group": "Metrics contributing to tma_ports_utilized_3m category",
> + "tma_resource_bound_group": "Metrics contributing to tma_resource_bound category",
> + "tma_retiring_group": "Metrics contributing to tma_retiring category",
> + "tma_serializing_operation_group": "Metrics contributing to tma_serializing_operation category",
> + "tma_store_bound_group": "Metrics contributing to tma_store_bound category",
> + "tma_store_op_utilization_group": "Metrics contributing to tma_store_op_utilization category"
> +}
> diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/mtl-metrics.json b/tools/perf/pmu-events/arch/x86/meteorlake/mtl-metrics.json
> new file mode 100644
> index 000000000000..12df19538fd6
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/x86/meteorlake/mtl-metrics.json
> @@ -0,0 +1,2531 @@
> +[
> + {
> + "BriefDescription": "C10 residency percent per package",
> + "MetricExpr": "cstate_pkg@c10\\-residency@ / TSC",
> + "MetricGroup": "Power",
> + "MetricName": "C10_Pkg_Residency",
> + "ScaleUnit": "100%"
> + },
> + {
> + "BriefDescription": "C1 residency percent per core",
> + "MetricExpr": "cstate_core@c1\\-residency@ / TSC",
> + "MetricGroup": "Power",
> + "MetricName": "C1_Core_Residency",
> + "ScaleUnit": "100%"
> + },
> + {
> + "BriefDescription": "C2 residency percent per package",
> + "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC",
> + "MetricGroup": "Power",
> + "MetricName": "C2_Pkg_Residency",
> + "ScaleUnit": "100%"
> + },
> + {
> + "BriefDescription": "C3 residency percent per package",
> + "MetricExpr": "cstate_pkg@c3\\-residency@ / TSC",
> + "MetricGroup": "Power",
> + "MetricName": "C3_Pkg_Residency",
> + "ScaleUnit": "100%"
> + },
> + {
> + "BriefDescription": "C6 residency percent per core",
> + "MetricExpr": "cstate_core@c6\\-residency@ / TSC",
> + "MetricGroup": "Power",
> + "MetricName": "C6_Core_Residency",
> + "ScaleUnit": "100%"
> + },
> + {
> + "BriefDescription": "C6 residency percent per package",
> + "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC",
> + "MetricGroup": "Power",
> + "MetricName": "C6_Pkg_Residency",
> + "ScaleUnit": "100%"
> + },
> + {
> + "BriefDescription": "C7 residency percent per core",
> + "MetricExpr": "cstate_core@c7\\-residency@ / TSC",
> + "MetricGroup": "Power",
> + "MetricName": "C7_Core_Residency",
> + "ScaleUnit": "100%"
> + },
> + {
> + "BriefDescription": "C7 residency percent per package",
> + "MetricExpr": "cstate_pkg@c7\\-residency@ / TSC",
> + "MetricGroup": "Power",
> + "MetricName": "C7_Pkg_Residency",
> + "ScaleUnit": "100%"
> + },
> + {
> + "BriefDescription": "C8 residency percent per package",
> + "MetricExpr": "cstate_pkg@c8\\-residency@ / TSC",
> + "MetricGroup": "Power",
> + "MetricName": "C8_Pkg_Residency",
> + "ScaleUnit": "100%"
> + },
> + {
> + "BriefDescription": "C9 residency percent per package",
> + "MetricExpr": "cstate_pkg@c9\\-residency@ / TSC",
> + "MetricGroup": "Power",
> + "MetricName": "C9_Pkg_Residency",
> + "ScaleUnit": "100%"
> + },
> + {
> + "BriefDescription": "Percentage of cycles spent in System Management Interrupts.",
> + "MetricExpr": "((msr@aperf@ - cycles) / msr@aperf@ if msr@smi@ > 0 else 0)",
> + "MetricGroup": "smi",
> + "MetricName": "smi_cycles",
> + "MetricThreshold": "smi_cycles > 0.1",
> + "ScaleUnit": "100%"
> + },
> + {
> + "BriefDescription": "Number of SMI interrupts.",
> + "MetricExpr": "msr@smi@",
> + "MetricGroup": "smi",
> + "MetricName": "smi_num",
> + "ScaleUnit": "1SMI#"
> + },
> + {
> + "BriefDescription": "Counts the number of issue slots that were not consumed by the backend due to certain allocation restrictions.",
> + "MetricExpr": "cpu_atom@TOPDOWN_BE_BOUND.ALLOC_RESTRICTIONS@ / tma_info_core_slots",
> + "MetricGroup": "TopdownL3;tma_L3_group;tma_resource_bound_group",
> + "MetricName": "tma_alloc_restriction",
> + "MetricThreshold": "tma_alloc_restriction > 0.1",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the total number of issue slots that were not consumed by the backend due to backend stalls",
> + "MetricExpr": "cpu_atom@TOPDOWN_BE_BOUND.ALL@ / tma_info_core_slots",
> + "MetricGroup": "TopdownL1;tma_L1_group",
> + "MetricName": "tma_backend_bound",
> + "MetricThreshold": "tma_backend_bound > 0.1",
> + "MetricgroupNoGroup": "TopdownL1",
> + "PublicDescription": "Counts the total number of issue slots that were not consumed by the backend due to backend stalls. Note that uops must be available for consumption in order for this event to count. If a uop is not available (IQ is empty), this event will not count. The rest of these subevents count backend stalls, in cycles, due to an outstanding request which is memory bound vs core bound. The subevents are not slot based events and therefore can not be precisely added or subtracted from the Backend_Bound_Aux subevents which are slot based.",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the total number of issue slots that were not consumed by the backend due to backend stalls",
> + "MetricExpr": "tma_backend_bound",
> + "MetricGroup": "TopdownL1;tma_L1_group",
> + "MetricName": "tma_backend_bound_aux",
> + "MetricThreshold": "tma_backend_bound_aux > 0.2",
> + "MetricgroupNoGroup": "TopdownL1",
> + "PublicDescription": "Counts the total number of issue slots that were not consumed by the backend due to backend stalls. Note that UOPS must be available for consumption in order for this event to count. If a uop is not available (IQ is empty), this event will not count. All of these subevents count backend stalls, in slots, due to a resource limitation. These are not cycle based events and therefore can not be precisely added or subtracted from the Backend_Bound subevents which are cycle based. These subevents are supplementary to Backend_Bound and can be used to analyze results from a resource perspective at allocation.",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the total number of issue slots that were not consumed by the backend because allocation is stalled due to a mispredicted jump or a machine clear",
> + "MetricExpr": "cpu_atom@TOPDOWN_BAD_SPECULATION.ALL@ / tma_info_core_slots",
> + "MetricGroup": "TopdownL1;tma_L1_group",
> + "MetricName": "tma_bad_speculation",
> + "MetricThreshold": "tma_bad_speculation > 0.15",
> + "MetricgroupNoGroup": "TopdownL1",
> + "PublicDescription": "Counts the total number of issue slots that were not consumed by the backend because allocation is stalled due to a mispredicted jump or a machine clear. Only issue slots wasted due to fast nukes such as memory ordering nukes are counted. Other nukes are not accounted for. Counts all issue slots blocked during this recovery window including relevant microcode flows and while uops are not yet available in the instruction queue (IQ). Also includes the issue slots that were consumed by the backend but were thrown away because they were younger than the mispredict or machine clear.",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of uops that are not from the microsequencer.",
> + "MetricExpr": "(cpu_atom@TOPDOWN_RETIRING.ALL@ - cpu_atom@UOPS_RETIRED.MS@) / tma_info_core_slots",
> + "MetricGroup": "TopdownL2;tma_L2_group;tma_retiring_group",
> + "MetricName": "tma_base",
> + "MetricThreshold": "tma_base > 0.6",
> + "MetricgroupNoGroup": "TopdownL2",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of issue slots that were not delivered by the frontend due to BACLEARS, which occurs when the Branch Target Buffer (BTB) prediction or lack thereof, was corrected by a later branch predictor in the frontend",
> + "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.BRANCH_DETECT@ / tma_info_core_slots",
> + "MetricGroup": "TopdownL3;tma_L3_group;tma_fetch_latency_group",
> + "MetricName": "tma_branch_detect",
> + "MetricThreshold": "tma_branch_detect > 0.05",
> + "PublicDescription": "Counts the number of issue slots that were not delivered by the frontend due to BACLEARS, which occurs when the Branch Target Buffer (BTB) prediction or lack thereof, was corrected by a later branch predictor in the frontend. Includes BACLEARS due to all branch types including conditional and unconditional jumps, returns, and indirect branches.",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of issue slots that were not consumed by the backend due to branch mispredicts.",
> + "MetricExpr": "cpu_atom@TOPDOWN_BAD_SPECULATION.MISPREDICT@ / tma_info_core_slots",
> + "MetricGroup": "TopdownL2;tma_L2_group;tma_bad_speculation_group",
> + "MetricName": "tma_branch_mispredicts",
> + "MetricThreshold": "tma_branch_mispredicts > 0.05",
> + "MetricgroupNoGroup": "TopdownL2",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of issue slots that were not delivered by the frontend due to BTCLEARS, which occurs when the Branch Target Buffer (BTB) predicts a taken branch.",
> + "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.BRANCH_RESTEER@ / tma_info_core_slots",
> + "MetricGroup": "TopdownL3;tma_L3_group;tma_fetch_latency_group",
> + "MetricName": "tma_branch_resteer",
> + "MetricThreshold": "tma_branch_resteer > 0.05",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of issue slots that were not delivered by the frontend due to the microcode sequencer (MS).",
> + "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.CISC@ / tma_info_core_slots",
> + "MetricGroup": "TopdownL3;tma_L3_group;tma_fetch_bandwidth_group",
> + "MetricName": "tma_cisc",
> + "MetricThreshold": "tma_cisc > 0.05",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of cycles due to backend bound stalls that are core execution bound and not attributed to outstanding demand load or store stalls.",
> + "MetricExpr": "max(0, tma_backend_bound - tma_memory_bound)",
> + "MetricGroup": "TopdownL2;tma_L2_group;tma_backend_bound_group",
> + "MetricName": "tma_core_bound",
> + "MetricThreshold": "tma_core_bound > 0.1",
> + "MetricgroupNoGroup": "TopdownL2",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of issue slots that were not delivered by the frontend due to decode stalls.",
> + "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.DECODE@ / tma_info_core_slots",
> + "MetricGroup": "TopdownL3;tma_L3_group;tma_fetch_bandwidth_group",
> + "MetricName": "tma_decode",
> + "MetricThreshold": "tma_decode > 0.05",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of machine clears relative to the number of nuke slots due to memory disambiguation.",
> + "MetricExpr": "tma_nuke * (cpu_atom@MACHINE_CLEARS.DISAMBIGUATION@ / cpu_atom@MACHINE_CLEARS.SLOW@)",
> + "MetricGroup": "TopdownL4;tma_L4_group;tma_nuke_group",
> + "MetricName": "tma_disambiguation",
> + "MetricThreshold": "tma_disambiguation > 0.02",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of cycles the core is stalled due to a demand load miss which hit in DRAM or MMIO (Non-DRAM).",
> + "MetricExpr": "cpu_atom@MEM_BOUND_STALLS_LOAD.LLC_MISS@ / tma_info_core_clks - max((cpu_atom@MEM_BOUND_STALLS_LOAD.ALL@ - cpu_atom@LD_HEAD.L1_MISS_AT_RET@) / tma_info_core_clks, 0) * cpu_atom@MEM_BOUND_STALLS_LOAD.LLC_MISS@ / cpu_atom@MEM_BOUND_STALLS_LOAD.ALL@",
> + "MetricGroup": "TopdownL3;tma_L3_group;tma_memory_bound_group",
> + "MetricName": "tma_dram_bound",
> + "MetricThreshold": "tma_dram_bound > 0.1",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of issue slots that were not consumed by the backend due to a machine clear classified as a fast nuke due to memory ordering, memory disambiguation and memory renaming.",
> + "MetricExpr": "cpu_atom@TOPDOWN_BAD_SPECULATION.FASTNUKE@ / tma_info_core_slots",
> + "MetricGroup": "TopdownL3;tma_L3_group;tma_machine_clears_group",
> + "MetricName": "tma_fast_nuke",
> + "MetricThreshold": "tma_fast_nuke > 0.05",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of issue slots that were not delivered by the frontend due to frontend bandwidth restrictions due to decode, predecode, cisc, and other limitations.",
> + "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.FRONTEND_BANDWIDTH@ / tma_info_core_slots",
> + "MetricGroup": "TopdownL2;tma_L2_group;tma_frontend_bound_group",
> + "MetricName": "tma_fetch_bandwidth",
> + "MetricThreshold": "tma_fetch_bandwidth > 0.1",
> + "MetricgroupNoGroup": "TopdownL2",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of issue slots that were not delivered by the frontend due to frontend bandwidth restrictions due to decode, predecode, cisc, and other limitations.",
> + "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.FRONTEND_LATENCY@ / tma_info_core_slots",
> + "MetricGroup": "TopdownL2;tma_L2_group;tma_frontend_bound_group",
> + "MetricName": "tma_fetch_latency",
> + "MetricThreshold": "tma_fetch_latency > 0.15",
> + "MetricgroupNoGroup": "TopdownL2",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of machine clears relative to the number of nuke slots due to FP assists.",
> + "MetricExpr": "tma_nuke * (cpu_atom@MACHINE_CLEARS.FP_ASSIST@ / cpu_atom@MACHINE_CLEARS.SLOW@)",
> + "MetricGroup": "TopdownL4;tma_L4_group;tma_nuke_group",
> + "MetricName": "tma_fp_assist",
> + "MetricThreshold": "tma_fp_assist > 0.02",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of floating point divide operations per uop.",
> + "MetricExpr": "cpu_atom@UOPS_RETIRED.FPDIV@ / tma_info_core_slots",
> + "MetricGroup": "TopdownL3;tma_L3_group;tma_base_group",
> + "MetricName": "tma_fpdiv_uops",
> + "MetricThreshold": "tma_fpdiv_uops > 0.2",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of issue slots that were not consumed by the backend due to frontend stalls.",
> + "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.ALL@ / tma_info_core_slots",
> + "MetricGroup": "TopdownL1;tma_L1_group",
> + "MetricName": "tma_frontend_bound",
> + "MetricThreshold": "tma_frontend_bound > 0.2",
> + "MetricgroupNoGroup": "TopdownL1",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of issue slots that were not delivered by the frontend due to instruction cache misses.",
> + "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.ICACHE@ / tma_info_core_slots",
> + "MetricGroup": "TopdownL3;tma_L3_group;tma_fetch_latency_group",
> + "MetricName": "tma_icache_misses",
> + "MetricThreshold": "tma_icache_misses > 0.05",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "",
> + "MetricExpr": "cpu_atom@CPU_CLK_UNHALTED.CORE@",
> + "MetricName": "tma_info_core_clks",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "",
> + "MetricExpr": "cpu_atom@CPU_CLK_UNHALTED.CORE_P@",
> + "MetricName": "tma_info_core_clks_p",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Cycles Per Instruction",
> + "MetricExpr": "tma_info_core_clks / INST_RETIRED.ANY",
> + "MetricName": "tma_info_core_cpi",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Instructions Per Cycle",
> + "MetricExpr": "cpu_atom@INST_RETIRED.ANY@ / tma_info_core_clks",
> + "MetricName": "tma_info_core_ipc",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "",
> + "MetricExpr": "6 * tma_info_core_clks",
> + "MetricName": "tma_info_core_slots",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Uops Per Instruction",
> + "MetricExpr": "cpu_atom@UOPS_RETIRED.ALL@ / INST_RETIRED.ANY",
> + "MetricName": "tma_info_core_upi",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Percent of instruction miss cost that hit in DRAM",
> + "MetricExpr": "100 * cpu_atom@MEM_BOUND_STALLS_IFETCH.LLC_MISS@ / cpu_atom@MEM_BOUND_STALLS_IFETCH.ALL@",
> + "MetricName": "tma_info_frontend_inst_miss_cost_dramhit_percent",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Percent of instruction miss cost that hit in the L2",
> + "MetricExpr": "100 * cpu_atom@MEM_BOUND_STALLS_IFETCH.L2_HIT@ / cpu_atom@MEM_BOUND_STALLS_IFETCH.ALL@",
> + "MetricName": "tma_info_frontend_inst_miss_cost_l2hit_percent",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Percent of instruction miss cost that hit in the L3",
> + "MetricExpr": "100 * cpu_atom@MEM_BOUND_STALLS_IFETCH.LLC_HIT@ / cpu_atom@MEM_BOUND_STALLS_IFETCH.ALL@",
> + "MetricName": "tma_info_frontend_inst_miss_cost_l3hit_percent",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Ratio of all branches which mispredict",
> + "MetricExpr": "cpu_atom@BR_MISP_RETIRED.ALL_BRANCHES@ / BR_INST_RETIRED.ALL_BRANCHES",
> + "MetricName": "tma_info_inst_mix_branch_mispredict_ratio",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Ratio between Mispredicted branches and unknown branches",
> + "MetricExpr": "cpu_atom@BR_MISP_RETIRED.ALL_BRANCHES@ / BACLEARSANY",
> + "MetricName": "tma_info_inst_mix_branch_mispredict_to_unknown_branch_ratio",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Percentage of all uops which are FPDiv uops",
> + "MetricExpr": "100 * cpu_atom@UOPS_RETIRED.FPDIV@ / UOPS_RETIREDALL",
> + "MetricName": "tma_info_inst_mix_fpdiv_uop_ratio",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Percentage of all uops which are IDiv uops",
> + "MetricExpr": "100 * cpu_atom@UOPS_RETIRED.IDIV@ / UOPS_RETIRED.ALL",
> + "MetricName": "tma_info_inst_mix_idiv_uop_ratio",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Instructions per Branch (lower number means higher occurrence rate)",
> + "MetricExpr": "cpu_atom@INST_RETIRED.ANY@ / BR_INST_RETIRED.ALL_BRANCHES",
> + "MetricName": "tma_info_inst_mix_ipbranch",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Instruction per (near) call (lower number means higher occurrence rate)",
> + "MetricExpr": "cpu_atom@INST_RETIRED.ANY@ / BR_INST_RETIRED.NEAR_CALL",
> + "MetricName": "tma_info_inst_mix_ipcall",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Instructions per Far Branch",
> + "MetricExpr": "cpu_atom@INST_RETIRED.ANY@ / (cpu_atom@BR_INST_RETIRED.FAR_BRANCH@ / 2)",
> + "MetricName": "tma_info_inst_mix_ipfarbranch",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Instructions per Load",
> + "MetricExpr": "cpu_atom@INST_RETIRED.ANY@ / MEM_UOPS_RETIRED.ALL_LOADS",
> + "MetricName": "tma_info_inst_mix_ipload",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Instructions per retired conditional Branch Misprediction where the branch was not taken",
> + "MetricExpr": "cpu_atom@INST_RETIRED.ANY@ / (cpu_atom@BR_MISP_RETIRED.COND@ - cpu_atom@BR_MISP_RETIRED.COND_TAKEN@)",
> + "MetricName": "tma_info_inst_mix_ipmisp_cond_ntaken",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Instructions per retired conditional Branch Misprediction where the branch was taken",
> + "MetricExpr": "cpu_atom@INST_RETIRED.ANY@ / BR_MISP_RETIRED.COND_TAKEN",
> + "MetricName": "tma_info_inst_mix_ipmisp_cond_taken",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Instructions per retired indirect call or jump Branch Misprediction",
> + "MetricExpr": "cpu_atom@INST_RETIRED.ANY@ / BR_MISP_RETIRED.INDIRECT",
> + "MetricName": "tma_info_inst_mix_ipmisp_indirect",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Instructions per retired return Branch Misprediction",
> + "MetricExpr": "cpu_atom@INST_RETIRED.ANY@ / BR_MISP_RETIRED.RETURN",
> + "MetricName": "tma_info_inst_mix_ipmisp_ret",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Instructions per retired Branch Misprediction",
> + "MetricExpr": "cpu_atom@INST_RETIRED.ANY@ / BR_MISP_RETIRED.ALL_BRANCHES",
> + "MetricName": "tma_info_inst_mix_ipmispredict",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Instructions per Store",
> + "MetricExpr": "cpu_atom@INST_RETIRED.ANY@ / MEM_UOPS_RETIRED.ALL_STORES",
> + "MetricName": "tma_info_inst_mix_ipstore",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Percentage of all uops which are ucode ops",
> + "MetricExpr": "100 * cpu_atom@UOPS_RETIRED.MS@ / UOPS_RETIRED.ALL",
> + "MetricName": "tma_info_inst_mix_microcode_uop_ratio",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Percentage of all uops which are x87 uops",
> + "MetricExpr": "100 * cpu_atom@UOPS_RETIRED.X87@ / UOPS_RETIRED.ALL",
> + "MetricName": "tma_info_inst_mix_x87_uop_ratio",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Percentage of total non-speculative loads with a address aliasing block",
> + "MetricExpr": "100 * cpu_atom@LD_BLOCKS.ADDRESS_ALIAS@ / MEM_UOPS_RETIRED.ALL_LOADS",
> + "MetricName": "tma_info_l1_bound_address_alias_blocks",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Percentage of total non-speculative loads that are splits",
> + "MetricExpr": "100 * cpu_atom@MEM_UOPS_RETIRED.SPLIT_LOADS@ / MEM_UOPS_RETIRED.ALL_LOADS",
> + "MetricName": "tma_info_l1_bound_load_splits",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Percentage of total non-speculative loads with a store forward or unknown store address block",
> + "MetricExpr": "100 * cpu_atom@LD_BLOCKS.DATA_UNKNOWN@ / MEM_UOPS_RETIRED.ALL_LOADS",
> + "MetricName": "tma_info_l1_bound_store_fwd_blocks",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Cycle cost per L2 hit",
> + "MetricExpr": "cpu_atom@MEM_BOUND_STALLS_LOAD.L2_HIT@ / MEM_LOAD_UOPS_RETIRED.L2_HIT",
> + "MetricName": "tma_info_memory_cycles_per_demand_load_l2_hit",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Cycle cost per LLC hit",
> + "MetricExpr": "cpu_atom@MEM_BOUND_STALLS_LOAD.LLC_HIT@ / MEM_LOAD_UOPS_RETIRED.L3_HIT",
> + "MetricName": "tma_info_memory_cycles_per_demand_load_l3_hit",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "load ops retired per 1000 instruction",
> + "MetricExpr": "1e3 * cpu_atom@MEM_UOPS_RETIRED.ALL_LOADS@ / INST_RETIRED.ANY",
> + "MetricName": "tma_info_memory_memloadpki",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Average CPU Utilization",
> + "MetricExpr": "cpu_atom@CPU_CLK_UNHALTED.REF_TSC@ / TSC",
> + "MetricName": "tma_info_system_cpu_utilization",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Fraction of cycles spent in Kernel mode",
> + "MetricExpr": "cpu_atom@CPU_CLK_UNHALTED.CORE@k / CPU_CLK_UNHALTED.CORE",
> + "MetricGroup": "Summary",
> + "MetricName": "tma_info_system_kernel_utilization",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Average Frequency Utilization relative nominal frequency",
> + "MetricExpr": "tma_info_core_clks / CPU_CLK_UNHALTED.REF_TSC",
> + "MetricGroup": "Power",
> + "MetricName": "tma_info_system_turbo_utilization",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of issue slots that were not delivered by the frontend due to Instruction Table Lookaside Buffer (ITLB) misses.",
> + "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.ITLB_MISS@ / tma_info_core_slots",
> + "MetricGroup": "TopdownL3;tma_L3_group;tma_fetch_latency_group",
> + "MetricName": "tma_itlb_misses",
> + "MetricThreshold": "tma_itlb_misses > 0.05",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of cycles that the oldest load of the load buffer is stalled at retirement due to a load block.",
> + "MetricExpr": "cpu_atom@LD_HEAD.L1_BOUND_AT_RET@ / tma_info_core_clks",
> + "MetricGroup": "TopdownL3;tma_L3_group;tma_memory_bound_group",
> + "MetricName": "tma_l1_bound",
> + "MetricThreshold": "tma_l1_bound > 0.1",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of cycles a core is stalled due to a demand load which hit in the L2 Cache.",
> + "MetricExpr": "cpu_atom@MEM_BOUND_STALLS_LOAD.L2_HIT@ / tma_info_core_clks - max((cpu_atom@MEM_BOUND_STALLS_LOAD.ALL@ - cpu_atom@LD_HEAD.L1_MISS_AT_RET@) / tma_info_core_clks, 0) * cpu_atom@MEM_BOUND_STALLS_LOAD.L2_HIT@ / cpu_atom@MEM_BOUND_STALLS_LOAD.ALL@",
> + "MetricGroup": "TopdownL3;tma_L3_group;tma_memory_bound_group",
> + "MetricName": "tma_l2_bound",
> + "MetricThreshold": "tma_l2_bound > 0.1",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of cycles a core is stalled due to a demand load which hit in the Last Level Cache (LLC) or other core with HITE/F/M.",
> + "MetricExpr": "cpu_atom@MEM_BOUND_STALLS_LOAD.LLC_HIT@ / tma_info_core_clks - max((cpu_atom@MEM_BOUND_STALLS_LOAD.ALL@ - cpu_atom@LD_HEAD.L1_MISS_AT_RET@) / tma_info_core_clks, 0) * cpu_atom@MEM_BOUND_STALLS_LOAD.LLC_HIT@ / cpu_atom@MEM_BOUND_STALLS_LOAD.ALL@",
> + "MetricGroup": "TopdownL3;tma_L3_group;tma_memory_bound_group",
> + "MetricName": "tma_l3_bound",
> + "MetricThreshold": "tma_l3_bound > 0.1",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of cycles, relative to the number of mem_scheduler slots, in which uops are blocked due to load buffer full",
> + "MetricExpr": "tma_mem_scheduler * cpu_atom@MEM_SCHEDULER_BLOCK.LD_BUF@ / MEM_SCHEDULER_BLOCK.ALL",
> + "MetricGroup": "TopdownL4;tma_L4_group;tma_mem_scheduler_group",
> + "MetricName": "tma_ld_buffer",
> + "MetricThreshold": "tma_ld_buffer > 0.05",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the total number of issue slots that were not consumed by the backend because allocation is stalled due to a machine clear (nuke) of any kind including memory ordering and memory disambiguation.",
> + "MetricExpr": "cpu_atom@TOPDOWN_BAD_SPECULATION.MACHINE_CLEARS@ / tma_info_core_slots",
> + "MetricGroup": "TopdownL2;tma_L2_group;tma_bad_speculation_group",
> + "MetricName": "tma_machine_clears",
> + "MetricThreshold": "tma_machine_clears > 0.05",
> + "MetricgroupNoGroup": "TopdownL2",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of issue slots that were not consumed by the backend due to memory reservation stalls in which a scheduler is not able to accept uops.",
> + "MetricExpr": "cpu_atom@TOPDOWN_BE_BOUND.MEM_SCHEDULER@ / tma_info_core_slots",
> + "MetricGroup": "TopdownL3;tma_L3_group;tma_resource_bound_group",
> + "MetricName": "tma_mem_scheduler",
> + "MetricThreshold": "tma_mem_scheduler > 0.1",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of cycles the core is stalled due to stores or loads.",
> + "MetricExpr": "min(tma_backend_bound, cpu_atom@LD_HEAD.ANY_AT_RET@ / tma_info_core_clks + tma_store_bound)",
> + "MetricGroup": "TopdownL2;tma_L2_group;tma_backend_bound_group",
> + "MetricName": "tma_memory_bound",
> + "MetricThreshold": "tma_memory_bound > 0.2",
> + "MetricgroupNoGroup": "TopdownL2",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of machine clears relative to the number of nuke slots due to memory ordering.",
> + "MetricExpr": "tma_nuke * (cpu_atom@MACHINE_CLEARS.MEMORY_ORDERING@ / cpu_atom@MACHINE_CLEARS.SLOW@)",
> + "MetricGroup": "TopdownL4;tma_L4_group;tma_nuke_group",
> + "MetricName": "tma_memory_ordering",
> + "MetricThreshold": "tma_memory_ordering > 0.02",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of uops that are from the complex flows issued by the micro-sequencer (MS)",
> + "MetricExpr": "cpu_atom@UOPS_RETIRED.MS@ / tma_info_core_slots",
> + "MetricGroup": "TopdownL2;tma_L2_group;tma_retiring_group",
> + "MetricName": "tma_ms_uops",
> + "MetricThreshold": "tma_ms_uops > 0.05",
> + "MetricgroupNoGroup": "TopdownL2",
> + "PublicDescription": "Counts the number of uops that are from the complex flows issued by the micro-sequencer (MS). This includes uops from flows due to complex instructions, faults, assists, and inserted flows.",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of issue slots that were not consumed by the backend due to IEC or FPC RAT stalls, which can be due to FIQ or IEC reservation stalls in which the integer, floating point or SIMD scheduler is not able to accept uops.",
> + "MetricExpr": "cpu_atom@TOPDOWN_BE_BOUND.NON_MEM_SCHEDULER@ / tma_info_core_slots",
> + "MetricGroup": "TopdownL3;tma_L3_group;tma_resource_bound_group",
> + "MetricName": "tma_non_mem_scheduler",
> + "MetricThreshold": "tma_non_mem_scheduler > 0.1",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of issue slots that were not consumed by the backend due to a machine clear (slow nuke).",
> + "MetricExpr": "cpu_atom@TOPDOWN_BAD_SPECULATION.NUKE@ / tma_info_core_slots",
> + "MetricGroup": "TopdownL3;tma_L3_group;tma_machine_clears_group",
> + "MetricName": "tma_nuke",
> + "MetricThreshold": "tma_nuke > 0.05",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of issue slots that were not delivered by the frontend due to other common frontend stalls not categorized.",
> + "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.OTHER@ / tma_info_core_slots",
> + "MetricGroup": "TopdownL3;tma_L3_group;tma_fetch_bandwidth_group",
> + "MetricName": "tma_other_fb",
> + "MetricThreshold": "tma_other_fb > 0.05",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of cycles that the oldest load of the load buffer is stalled at retirement due to a number of other load blocks.",
> + "MetricExpr": "cpu_atom@LD_HEAD.OTHER_AT_RET@ / tma_info_core_clks",
> + "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group",
> + "MetricName": "tma_other_l1",
> + "MetricThreshold": "tma_other_l1 > 0.05",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of cycles the core is stalled due to a demand load miss which hits in the L2, LLC, DRAM or MMIO (Non-DRAM) but could not be correctly attributed or cycles in which the load miss is waiting on a request buffer.",
> + "MetricExpr": "max(0, tma_memory_bound - (tma_store_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_dram_bound))",
> + "MetricGroup": "TopdownL3;tma_L3_group;tma_memory_bound_group",
> + "MetricName": "tma_other_load_store",
> + "MetricThreshold": "tma_other_load_store > 0.1",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of uops retired excluding ms and fp div uops.",
> + "MetricExpr": "(cpu_atom@TOPDOWN_RETIRING.ALL@ - cpu_atom@UOPS_RETIRED.MS@ - cpu_atom@UOPS_RETIRED.FPDIV@) / tma_info_core_slots",
> + "MetricGroup": "TopdownL3;tma_L3_group;tma_base_group",
> + "MetricName": "tma_other_ret",
> + "MetricThreshold": "tma_other_ret > 0.3",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of machine clears relative to the number of nuke slots due to page faults.",
> + "MetricExpr": "tma_nuke * (cpu_atom@MACHINE_CLEARS.PAGE_FAULT@ / cpu_atom@MACHINE_CLEARS.SLOW@)",
> + "MetricGroup": "TopdownL4;tma_L4_group;tma_nuke_group",
> + "MetricName": "tma_page_fault",
> + "MetricThreshold": "tma_page_fault > 0.02",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of issue slots that were not delivered by the frontend due to wrong predecodes.",
> + "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.PREDECODE@ / tma_info_core_slots",
> + "MetricGroup": "TopdownL3;tma_L3_group;tma_fetch_bandwidth_group",
> + "MetricName": "tma_predecode",
> + "MetricThreshold": "tma_predecode > 0.05",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of issue slots that were not consumed by the backend due to the physical register file unable to accept an entry (marble stalls).",
> + "MetricExpr": "cpu_atom@TOPDOWN_BE_BOUND.REGISTER@ / tma_info_core_slots",
> + "MetricGroup": "TopdownL3;tma_L3_group;tma_resource_bound_group",
> + "MetricName": "tma_register",
> + "MetricThreshold": "tma_register > 0.1",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of issue slots that were not consumed by the backend due to the reorder buffer being full (ROB stalls).",
> + "MetricExpr": "cpu_atom@TOPDOWN_BE_BOUND.REORDER_BUFFER@ / tma_info_core_slots",
> + "MetricGroup": "TopdownL3;tma_L3_group;tma_resource_bound_group",
> + "MetricName": "tma_reorder_buffer",
> + "MetricThreshold": "tma_reorder_buffer > 0.1",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the total number of issue slots that were not consumed by the backend due to backend stalls",
> + "MetricExpr": "tma_backend_bound",
> + "MetricGroup": "TopdownL2;tma_L2_group;tma_backend_bound_aux_group",
> + "MetricName": "tma_resource_bound",
> + "MetricThreshold": "tma_resource_bound > 0.2",
> + "MetricgroupNoGroup": "TopdownL2",
> + "PublicDescription": "Counts the total number of issue slots that were not consumed by the backend due to backend stalls. Note that uops must be available for consumption in order for this event to count. If a uop is not available (IQ is empty), this event will not count.",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of issue slots that result in retirement slots.",
> + "MetricExpr": "cpu_atom@TOPDOWN_RETIRING.ALL@ / tma_info_core_slots",
> + "MetricGroup": "TopdownL1;tma_L1_group",
> + "MetricName": "tma_retiring",
> + "MetricThreshold": "tma_retiring > 0.75",
> + "MetricgroupNoGroup": "TopdownL1",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of cycles, relative to the number of mem_scheduler slots, in which uops are blocked due to RSV full relative",
> + "MetricExpr": "tma_mem_scheduler * cpu_atom@MEM_SCHEDULER_BLOCK.RSV@ / MEM_SCHEDULER_BLOCK.ALL",
> + "MetricGroup": "TopdownL4;tma_L4_group;tma_mem_scheduler_group",
> + "MetricName": "tma_rsv",
> + "MetricThreshold": "tma_rsv > 0.05",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of issue slots that were not consumed by the backend due to scoreboards from the instruction queue (IQ), jump execution unit (JEU), or microcode sequencer (MS).",
> + "MetricExpr": "cpu_atom@TOPDOWN_BE_BOUND.SERIALIZATION@ / tma_info_core_slots",
> + "MetricGroup": "TopdownL3;tma_L3_group;tma_resource_bound_group",
> + "MetricName": "tma_serialization",
> + "MetricThreshold": "tma_serialization > 0.1",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of machine clears relative to the number of nuke slots due to SMC.",
> + "MetricExpr": "tma_nuke * (cpu_atom@MACHINE_CLEARS.SMC@ / cpu_atom@MACHINE_CLEARS.SLOW@)",
> + "MetricGroup": "TopdownL4;tma_L4_group;tma_nuke_group",
> + "MetricName": "tma_smc",
> + "MetricThreshold": "tma_smc > 0.02",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of cycles, relative to the number of mem_scheduler slots, in which uops are blocked due to store buffer full",
> + "MetricExpr": "tma_store_bound",
> + "MetricGroup": "TopdownL4;tma_L4_group;tma_mem_scheduler_group",
> + "MetricName": "tma_st_buffer",
> + "MetricThreshold": "tma_st_buffer > 0.05",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of cycles that the oldest load of the load buffer is stalled at retirement due to a first level TLB miss.",
> + "MetricExpr": "cpu_atom@LD_HEAD.DTLB_MISS_AT_RET@ / tma_info_core_clks",
> + "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group",
> + "MetricName": "tma_stlb_hit",
> + "MetricThreshold": "tma_stlb_hit > 0.05",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of cycles that the oldest load of the load buffer is stalled at retirement due to a second level TLB miss requiring a page walk.",
> + "MetricExpr": "cpu_atom@LD_HEAD.PGWALK_AT_RET@ / tma_info_core_clks",
> + "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group",
> + "MetricName": "tma_stlb_miss",
> + "MetricThreshold": "tma_stlb_miss > 0.05",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of cycles the core is stalled due to store buffer full.",
> + "MetricExpr": "tma_mem_scheduler * (cpu_atom@MEM_SCHEDULER_BLOCKST_BUF@ / cpu_atom@MEM_SCHEDULER_BLOCK.ALL@)",
> + "MetricGroup": "TopdownL3;tma_L3_group;tma_memory_bound_group",
> + "MetricName": "tma_store_bound",
> + "MetricThreshold": "tma_store_bound > 0.1",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Counts the number of cycles that the oldest load of the load buffer is stalled at retirement due to a store forward block.",
> + "MetricExpr": "cpu_atom@LD_HEAD.ST_ADDR_AT_RET@ / tma_info_core_clks",
> + "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group",
> + "MetricName": "tma_store_fwd_blk",
> + "MetricThreshold": "tma_store_fwd_blk > 0.05",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_atom"
> + },
> + {
> + "BriefDescription": "Uncore frequency per die [GHZ]",
> + "MetricExpr": "tma_info_system_socket_clks / #num_dies / duration_time / 1e9",
> + "MetricGroup": "SoC",
> + "MetricName": "UNCORE_FREQ",
> + "Unit": "cpu_core"
> + },
> + {
> + "BriefDescription": "This metric represents Core fraction of cycles CPU dispatched uops on execution ports for ALU operations.",
> + "MetricExpr": "(cpu_core@UOPS_DISPATCHED.PORT_0@ + cpu_core@UOPS_DISPATCHED.PORT_1@ + cpu_core@UOPS_DISPATCHED.PORT_5_11@ + cpu_core@UOPS_DISPATCHED.PORT_6@) / (5 * tma_info_core_core_clks)",
> + "MetricGroup": "TopdownL5;tma_L5_group;tma_ports_utilized_3m_group",
> + "MetricName": "tma_alu_op_utilization",
> + "MetricThreshold": "tma_alu_op_utilization > 0.4",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_core"
> + },
> + {
> + "BriefDescription": "This metric estimates fraction of slots the CPU retired uops delivered by the Microcode_Sequencer as a result of Assists",
> + "MetricExpr": "78 * cpu_core@xxxxxxxxxxx@ / tma_info_thread_slots",
> + "MetricGroup": "TopdownL4;tma_L4_group;tma_microcode_sequencer_group",
> + "MetricName": "tma_assists",
> + "MetricThreshold": "tma_assists > 0.1 & (tma_microcode_sequencer > 0.05 & tma_heavy_operations > 0.1)",
> + "PublicDescription": "This metric estimates fraction of slots the CPU retired uops delivered by the Microcode_Sequencer as a result of Assists. Assists are long sequences of uops that are required in certain corner-cases for operations that cannot be handled natively by the execution pipeline. For example; when working with very small floating point values (so-called Denormals); the FP units are not set up to perform these operations natively. Instead; a sequence of instructions to perform the computation on the Denormals is injected into the pipeline. Since these microcode sequences might be dozens of uops long; Assists can be extremely deleterious to performance and they can be avoided in many cases. Sample with: ASSISTS.ANY",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_core"
> + },
> + {
> + "BriefDescription": "This metric estimates fraction of slots the CPU retired uops as a result of handing SSE to AVX* or AVX* to SSE transition Assists.",
> + "MetricExpr": "63 * cpu_core@ASSISTS.SSE_AVX_MIX@ / tma_info_thread_slots",
> + "MetricGroup": "HPC;TopdownL5;tma_L5_group;tma_assists_group",
> + "MetricName": "tma_avx_assists",
> + "MetricThreshold": "tma_avx_assists > 0.1",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_core"
> + },
> + {
> + "BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend",
> + "MetricExpr": "cpu_core@topdown\\-be\\-bound@ / (cpu_core@topdown\\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiring@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots",
> + "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
> + "MetricName": "tma_backend_bound",
> + "MetricThreshold": "tma_backend_bound > 0.2",
> + "MetricgroupNoGroup": "TopdownL1",
> + "PublicDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend. Backend is the portion of the processor core where the out-of-order scheduler dispatches ready uops into their respective execution units; and once completed these uops get retired according to program order. For example; stalls due to data-cache misses or stalls due to the divider unit being overloaded are both categorized under Backend Bound. Backend Bound is further divided into two main categories: Memory Bound and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_core"
> + },
> + {
> + "BriefDescription": "This category represents fraction of slots wasted due to incorrect speculations",
> + "MetricExpr": "max(1 - (tma_frontend_bound + tma_backend_bound + tma_retiring), 0)",
> + "MetricGroup": "TmaL1;TopdownL1;tma_L1_group",
> + "MetricName": "tma_bad_speculation",
> + "MetricThreshold": "tma_bad_speculation > 0.15",
> + "MetricgroupNoGroup": "TopdownL1",
> + "PublicDescription": "This category represents fraction of slots wasted due to incorrect speculations. This include slots used to issue uops that do not eventually get retired and slots for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work due to miss-predicted branches are categorized under Bad Speculation category. Incorrect data speculation followed by Memory Ordering Nukes is another example.",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_core"
> + },
> + {
> + "BriefDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction",
> + "MetricExpr": "cpu_core@topdown\\-br\\-mispredict@ / (cpu_core@topdown\\-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiring@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots",
> + "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM",
> + "MetricName": "tma_branch_mispredicts",
> + "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15",
> + "MetricgroupNoGroup": "TopdownL2",
> + "PublicDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction. These slots are either wasted by uops fetched from an incorrectly speculated program path; or stalls when the out-of-order part of the machine needs to recover its state from a speculative path. Sample with: TOPDOWN.BR_MISPREDICT_SLOTS. Related metrics: tma_info_bad_spec_branch_misprediction_cost, tma_info_bottleneck_mispredictions, tma_mispredicts_resteers",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_core"
> + },
> + {
> + "BriefDescription": "This metric represents fraction of cycles the CPU was stalled due to Branch Resteers",
> + "MetricExpr": "cpu_core@INT_MISC.CLEAR_RESTEER_CYCLES@ / tma_info_thread_clks + tma_unknown_branches",
> + "MetricGroup": "FetchLat;TopdownL3;tma_L3_group;tma_fetch_latency_group",
> + "MetricName": "tma_branch_resteers",
> + "MetricThreshold": "tma_branch_resteers > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15)",
> + "PublicDescription": "This metric represents fraction of cycles the CPU was stalled due to Branch Resteers. Branch Resteers estimates the Frontend delay in fetching operations from corrected path; following all sorts of miss-predicted branches. For example; branchy code with lots of miss-predictions might get categorized under Branch Resteers. Note the value of this node may overlap with its siblings. Sample with: BR_MISP_RETIRED.ALL_BRANCHES",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_core"
> + },
> + {
> + "BriefDescription": "This metric represents fraction of cycles the CPU was stalled due staying in C0.1 power-performance optimized state (Faster wakeup time; Smaller power savings).",
> + "MetricExpr": "cpu_core@CPU_CLK_UNHALTED.C01@ / tma_info_thread_clks",
> + "MetricGroup": "C0Wait;TopdownL4;tma_L4_group;tma_serializing_operation_group",
> + "MetricName": "tma_c01_wait",
> + "MetricThreshold": "tma_c01_wait > 0.05 & (tma_serializing_operation > 0.1 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_core"
> + },
> + {
> + "BriefDescription": "This metric represents fraction of cycles the CPU was stalled due staying in C0.2 power-performance optimized state (Slower wakeup time; Larger power savings).",
> + "MetricExpr": "cpu_core@CPU_CLK_UNHALTED.C02@ / tma_info_thread_clks",
> + "MetricGroup": "C0Wait;TopdownL4;tma_L4_group;tma_serializing_operation_group",
> + "MetricName": "tma_c02_wait",
> + "MetricThreshold": "tma_c02_wait > 0.05 & (tma_serializing_operation > 0.1 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_core"
> + },
> + {
> + "BriefDescription": "This metric estimates fraction of cycles the CPU retired uops originated from CISC (complex instruction set computer) instruction",
> + "MetricExpr": "max(0, tma_microcode_sequencer - tma_assists)",
> + "MetricGroup": "TopdownL4;tma_L4_group;tma_microcode_sequencer_group",
> + "MetricName": "tma_cisc",
> + "MetricThreshold": "tma_cisc > 0.1 & (tma_microcode_sequencer > 0.05 & tma_heavy_operations > 0.1)",
> + "PublicDescription": "This metric estimates fraction of cycles the CPU retired uops originated from CISC (complex instruction set computer) instruction. A CISC instruction has multiple uops that are required to perform the instruction's functionality as in the case of read-modify-write as an example. Since these instructions require multiple uops they may or may not imply sub-optimal use of machine resources. Sample with: FRONTEND_RETIRED.MS_FLOWS",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_core"
> + },
> + {
> + "BriefDescription": "This metric represents fraction of cycles the CPU was stalled due to Branch Resteers as a result of Machine Clears",
> + "MetricExpr": "(1 - tma_branch_mispredicts / tma_bad_speculation) * cpu_core@INT_MISC.CLEAR_RESTEER_CYCLES@ / tma_info_thread_clks",
> + "MetricGroup": "BadSpec;MachineClears;TopdownL4;tma_L4_group;tma_branch_resteers_group;tma_issueMC",
> + "MetricName": "tma_clears_resteers",
> + "MetricThreshold": "tma_clears_resteers > 0.05 & (tma_branch_resteers > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15))",
> + "PublicDescription": "This metric represents fraction of cycles the CPU was stalled due to Branch Resteers as a result of Machine Clears. Sample with: INT_MISC.CLEAR_RESTEER_CYCLES. Related metrics: tma_l1_bound, tma_machine_clears, tma_microcode_sequencer, tma_ms_switches",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_core"
> + },
> + {
> + "BriefDescription": "This metric estimates fraction of cycles while the memory subsystem was handling synchronizations due to contested accesses",
> + "MetricExpr": "(cpu_core@MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS@ * min(cpu_core@MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS@:retire_latency, 24 * tma_info_system_core_frequency) + cpu_core@MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD@ * min(cpu_core@MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD@:retire_latency, 25 * tma_info_system_core_frequency) * (cpu_core@OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM@ / (cpu_core@OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM@ + cpu_core@OCR.DEMAND_DATA_RDL3_HIT.SNOOP_HIT_WITH_FWD@))) * (1 + cpu_core@MEM_LOAD_RETIRED.FB_HIT@ / cpu_core@MEM_LOAD_RETIRED.L1_MISS@ / 2) / tma_info_thread_clks",

Thanks Weilin! The modifier on an event doesn't usually have the colon
when there is a slash. So:
cpu_core@MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD@:retire_latency
Should be:
cpu_core@MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD@retire_latency
The modifiers also tend to be a single character. The modifier 'R' is
unused, so I wonder we can use it here so that we can support these
events not just in metrics.

Ian