Re: [PATCH v5 11/24] perf vendor events: Update/add Graniterapids events/metrics

From: Liang, Kan
Date: Thu Feb 06 2025 - 13:57:51 EST

Next message: Andy Shevchenko: "Re: [PATCH v3 01/20] driver core: Split devres APIs to device/devres.h"
Previous message: Andy Shevchenko: "Re: [PATCH] serial: 8250: Fix fifo underflow on flush"
In reply to: Liang, Kan: "Re: [PATCH v5 11/24] perf vendor events: Update/add Graniterapids events/metrics"
Next in thread: Liang, Kan: "Re: [PATCH v5 11/24] perf vendor events: Update/add Graniterapids events/metrics"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2025-02-06 12:36 p.m., Ian Rogers wrote:
> On Thu, Feb 6, 2025 at 9:11 AM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx> wrote:
>
>>
>>
>> On 2025-02-06 11:40 a.m., Ian Rogers wrote:
>>> On Thu, Feb 6, 2025 at 6:32 AM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx>
>> wrote:
>>>>
>>>> On 2025-02-05 4:33 p.m., Ian Rogers wrote:
>>>>> On Wed, Feb 5, 2025 at 1:10 PM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx>
>> wrote:
>>>>>>
>>>>>> On 2025-02-05 3:23 p.m., Ian Rogers wrote:
>>>>>>> On Wed, Feb 5, 2025 at 11:11 AM Liang, Kan <
>> kan.liang@xxxxxxxxxxxxxxx> wrote:
>>>>>>>>
>>>>>>>> On 2025-02-05 12:31 p.m., Ian Rogers wrote:
>>>>>>>>> + {
>>>>>>>>> + "BriefDescription": "This category represents fraction of
>> slots utilized by useful work i.e. issued uops that eventually get retired",
>>>>>>>>> + "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound
>> + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 *
>> slots",
>>>>>>>>> + "MetricGroup": "BvUW;TmaL1;TopdownL1;tma_L1_group",
>>>>>>>>> + "MetricName": "tma_retiring",
>>>>>>>>> + "MetricThreshold": "tma_retiring > 0.7 |
>> tma_heavy_operations > 0.1",
>>>>>>>>> + "MetricgroupNoGroup": "TopdownL1",
>>>>>>>>> + "PublicDescription": "This category represents fraction
>> of slots utilized by useful work i.e. issued uops that eventually get
>> retired. Ideally; all pipeline slots would be attributed to the Retiring
>> category. Retiring of 100% would indicate the maximum Pipeline_Width
>> throughput was achieved. Maximizing Retiring typically increases the
>> Instructions-per-cycle (see IPC metric). Note that a high Retiring value
>> does not necessary mean there is no room for more performance. For
>> example; Heavy-operations or Microcode Assists are categorized under
>> Retiring. They often indicate suboptimal performance and can often be
>> optimized or avoided. Sample with: UOPS_RETIRED.SLOTS",
>>>>>>>>> + "ScaleUnit": "100%"
>>>>>>>>> + },
>>>>>>>>
>>>>>>>> The "Default" tag is missed for GNR as well.
>>>>>>>> It seems the new CPUIDs are not added in the script?
>>>>>>>
>>>>>>> Spotted it, we need to manually say which architectures with
>> TopdownL1
>>>>>>> should be in Default because it was insisted upon that pre-Icelake
>>>>>>> CPUs with TopdownL1 not have TopdownL1 in Default. As you know, my
>>>>>>> preference would be to always put TopdownL1 metrics into Default.
>>>>>>>
>>>>>>
>>>>>> For the future platforms, there should be always at least TopdownL1
>>>>>> support. Intel even adds extra fixed counters for the TopdownL1
>> events.
>>>>>>
>>>>>> Maybe the script should be changed to only mark the old pre-Icelake as
>>>>>> no TopdownL1 Default. For the other platforms, always add TopdownL1 as
>>>>>> Default. It would avoid manually adding it for every new platforms.
>>>>>
>>>>> That's fair. What about TopdownL2 that is currently only in the
>>>>> Default set for SPR?
>>>>>
>>>>
>>>> Yes, the TopdownL2 is a bit tricky, which requires much more events.
>>>> Could you please set it just for SPR/EMR/GNR for now?
>>>>
>>>> I will ask around internally and make a long-term solution for the
>>>> TopdownL2.
>>>
>>> Thanks Kan, I've updated the script the existing way for now. Thomas
>>> saw another issue with TSC which is also fixed. I'm trying to
>>> understand what happened with it before sending out v6:
>>>
>> https://lore.kernel.org/lkml/4f42946ffdf474fbf8aeaa142c25a25ebe739b78.camel@xxxxxxxxx/
>>> """
>>> There are all some errors like this,
>>>
>>> Testing tma_cisc
>>> Metric contains missing events
>>> Cannot resolve IDs for tma_cisc: cpu_atom@TOPDOWN_FE_BOUND.CISC@ / (5
>>> * cpu_atom@CPU_CLK_UNHALTED.CORE@)
>>> """
>>> But checking the json I wasn't able to spot a model with the metric
>>> and without these json events. Knowing the model would make my life
>>> easier :-)
>>>
>>
>> The problem should be caused by the fundamental Topdown metrics, e.g.,
>> tma_frontend_bound, since the MetricThreshold of the tma_cisc requires
>> the Topdown metrics.
>>
>> $ ./perf stat -M tma_frontend_bound
>> Cannot resolve IDs for tma_frontend_bound:
>> cpu_atom@TOPDOWN_FE_BOUND.ALL@ / (8 * cpu_atom@CPU_CLK_UNHALTED.CORE@)
>>
>>
>> The metric itself is correct.
>>
>> + "BriefDescription": "Counts the number of issue slots that were
>> not consumed by the backend due to frontend stalls.",
>> + "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.ALL@ / (8 *
>> cpu_atom@CPU_CLK_UNHALTED.CORE@)",
>> + "MetricGroup": "TopdownL1;tma_L1_group",
>> + "MetricName": "tma_frontend_bound",
>> + "MetricThreshold": "(tma_frontend_bound >0.20)",
>> + "MetricgroupNoGroup": "TopdownL1",
>> + "ScaleUnit": "100%",
>> + "Unit": "cpu_atom"
>> + },
>>
>> However, when I dump the debug information,
>> ./perf stat -M tma_frontend_bound -vvv
>>
>> I got below debug information. I have no idea where the slot is from.
>> It seems the perf code mess up the p-core metrics with the e-core
>> metrics. But why only slot?
>> It seems a bug of perf tool.
>>
>> found event cpu_atom@CPU_CLK_UNHALTED.CORE@
>> found event cpu_atom@TOPDOWN_FE_BOUND.ALL@
>> found event slots
>> Parsing metric events
>>
>> '{cpu_atom/CPU_CLK_UNHALTED.CORE,metric-id=cpu_atom!3CPU_CLK_UNHALTED.CORE!3/,cpu_atom/TOPDOWN_FE_BOUND.ALL,metric-id=cpu_atom!3TOPDOWN_FE_BOUND.ALL!3/,slots/metric-id=slots/}:W'

It because the perf adds "slot" as a tool event for the e-core Topdown
metrics.
There is no "slot" event for e-core.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/metricgroup.c#n1481

I will check why "slot" event is added as a tool event for e-core?
That doesn't make sense.

Thanks,
Kan
>>
>
> Some more clues for me but still no model name :-)
> If this were in the metric json I'd expect the issue to be here:
> https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py#L1626
> but it appears the PMU in perf is somehow injecting events - I wasn't aware
> this happened but I don't see every change, my memory is also fallible. I'd
> expect the injection if it's happening to be in:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/arch/x86/util/topdown.c?h=perf-tools-next
> or:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/metricgroup.c?h=perf-tools-next
> and I'm not seeing it. Could you help me to debug as I have no way to
> reproduce? Perhaps set a watch point on the number of entries in the evlist.
>
> Thanks,
> Ian
>
>
>
>>
>>
>> Thanks,
>> Kan
>>
>

Next message: Andy Shevchenko: "Re: [PATCH v3 01/20] driver core: Split devres APIs to device/devres.h"
Previous message: Andy Shevchenko: "Re: [PATCH] serial: 8250: Fix fifo underflow on flush"
In reply to: Liang, Kan: "Re: [PATCH v5 11/24] perf vendor events: Update/add Graniterapids events/metrics"
Next in thread: Liang, Kan: "Re: [PATCH v5 11/24] perf vendor events: Update/add Graniterapids events/metrics"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]