Re: [PATCH v5 11/24] perf vendor events: Update/add Graniterapids events/metrics

From: Liang, Kan
Date: Thu Feb 06 2025 - 12:11:24 EST




On 2025-02-06 11:40 a.m., Ian Rogers wrote:
> On Thu, Feb 6, 2025 at 6:32 AM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx> wrote:
>>
>> On 2025-02-05 4:33 p.m., Ian Rogers wrote:
>>> On Wed, Feb 5, 2025 at 1:10 PM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx> wrote:
>>>>
>>>> On 2025-02-05 3:23 p.m., Ian Rogers wrote:
>>>>> On Wed, Feb 5, 2025 at 11:11 AM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx> wrote:
>>>>>>
>>>>>> On 2025-02-05 12:31 p.m., Ian Rogers wrote:
>>>>>>> + {
>>>>>>> + "BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired",
>>>>>>> + "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * slots",
>>>>>>> + "MetricGroup": "BvUW;TmaL1;TopdownL1;tma_L1_group",
>>>>>>> + "MetricName": "tma_retiring",
>>>>>>> + "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1",
>>>>>>> + "MetricgroupNoGroup": "TopdownL1",
>>>>>>> + "PublicDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired. Ideally; all pipeline slots would be attributed to the Retiring category. Retiring of 100% would indicate the maximum Pipeline_Width throughput was achieved. Maximizing Retiring typically increases the Instructions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no room for more performance. For example; Heavy-operations or Microcode Assists are categorized under Retiring. They often indicate suboptimal performance and can often be optimized or avoided. Sample with: UOPS_RETIRED.SLOTS",
>>>>>>> + "ScaleUnit": "100%"
>>>>>>> + },
>>>>>>
>>>>>> The "Default" tag is missed for GNR as well.
>>>>>> It seems the new CPUIDs are not added in the script?
>>>>>
>>>>> Spotted it, we need to manually say which architectures with TopdownL1
>>>>> should be in Default because it was insisted upon that pre-Icelake
>>>>> CPUs with TopdownL1 not have TopdownL1 in Default. As you know, my
>>>>> preference would be to always put TopdownL1 metrics into Default.
>>>>>
>>>>
>>>> For the future platforms, there should be always at least TopdownL1
>>>> support. Intel even adds extra fixed counters for the TopdownL1 events.
>>>>
>>>> Maybe the script should be changed to only mark the old pre-Icelake as
>>>> no TopdownL1 Default. For the other platforms, always add TopdownL1 as
>>>> Default. It would avoid manually adding it for every new platforms.
>>>
>>> That's fair. What about TopdownL2 that is currently only in the
>>> Default set for SPR?
>>>
>>
>> Yes, the TopdownL2 is a bit tricky, which requires much more events.
>> Could you please set it just for SPR/EMR/GNR for now?
>>
>> I will ask around internally and make a long-term solution for the
>> TopdownL2.
>
> Thanks Kan, I've updated the script the existing way for now. Thomas
> saw another issue with TSC which is also fixed. I'm trying to
> understand what happened with it before sending out v6:
> https://lore.kernel.org/lkml/4f42946ffdf474fbf8aeaa142c25a25ebe739b78.camel@xxxxxxxxx/
> """
> There are all some errors like this,
>
> Testing tma_cisc
> Metric contains missing events
> Cannot resolve IDs for tma_cisc: cpu_atom@TOPDOWN_FE_BOUND.CISC@ / (5
> * cpu_atom@CPU_CLK_UNHALTED.CORE@)
> """
> But checking the json I wasn't able to spot a model with the metric
> and without these json events. Knowing the model would make my life
> easier :-)
>

The problem should be caused by the fundamental Topdown metrics, e.g.,
tma_frontend_bound, since the MetricThreshold of the tma_cisc requires
the Topdown metrics.

$ ./perf stat -M tma_frontend_bound
Cannot resolve IDs for tma_frontend_bound:
cpu_atom@TOPDOWN_FE_BOUND.ALL@ / (8 * cpu_atom@CPU_CLK_UNHALTED.CORE@)


The metric itself is correct.

+ "BriefDescription": "Counts the number of issue slots that were
not consumed by the backend due to frontend stalls.",
+ "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.ALL@ / (8 *
cpu_atom@CPU_CLK_UNHALTED.CORE@)",
+ "MetricGroup": "TopdownL1;tma_L1_group",
+ "MetricName": "tma_frontend_bound",
+ "MetricThreshold": "(tma_frontend_bound >0.20)",
+ "MetricgroupNoGroup": "TopdownL1",
+ "ScaleUnit": "100%",
+ "Unit": "cpu_atom"
+ },

However, when I dump the debug information,
./perf stat -M tma_frontend_bound -vvv

I got below debug information. I have no idea where the slot is from.
It seems the perf code mess up the p-core metrics with the e-core
metrics. But why only slot?
It seems a bug of perf tool.

found event cpu_atom@CPU_CLK_UNHALTED.CORE@
found event cpu_atom@TOPDOWN_FE_BOUND.ALL@
found event slots
Parsing metric events
'{cpu_atom/CPU_CLK_UNHALTED.CORE,metric-id=cpu_atom!3CPU_CLK_UNHALTED.CORE!3/,cpu_atom/TOPDOWN_FE_BOUND.ALL,metric-id=cpu_atom!3TOPDOWN_FE_BOUND.ALL!3/,slots/metric-id=slots/}:W'


Thanks,
Kan