Re: [External] : [RFC PATCH v2 1/6] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
From: James Clark
Date: Thu Nov 24 2022 - 11:51:12 EST
On 24/11/2022 16:32, Jing Zhang wrote:
>
>
> 在 2022/11/23 下午10:26, James Clark 写道:
>>
>>
>> On 22/11/2022 15:41, Jing Zhang wrote:
>>>
>>>
>>> 在 2022/11/22 下午10:00, James Clark 写道:
>>>>
>>>>
>>>> On 21/11/2022 17:55, John Garry wrote:
>>>>> On 21/11/2022 15:17, Jing Zhang wrote:
>>>>>> I'm sorry that I misunderstood the purpose of putting metric as
>>>>>> arch_std_event at first,
>>>>>> and now it works after the modification over your suggestion.
>>>>>>
>>>>>> But there are also a few questions:
>>>>>>
>>>>>> 1. The value of the slot in the topdownL1 is various in different
>>>>>> architectures, for example,
>>>>>> the slot is 5 on neoverse-n2. If I put topdownL1 metric as
>>>>>> arch_std_event, then I need to
>>>>>> specify the slot to 5 in n2. I can specify slot values in metric like
>>>>>> below, but is there any
>>>>>> other concise way to do this?
>>>>>>
>>>>>> diff --git
>>>>>> a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>>>> b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>>>> index 8ff1dfe..b473baf 100644
>>>>>> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>>>> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>>>> @@ -1,4 +1,23 @@
>>>>>> [
>>>>>> + {
>>>>>> + "MetricExpr": "5",
>>>>>> + "PublicDescription": "A pipeline slot represents the
>>>>>> hardware resources needed to process one uOp",
>>>>>> + "BriefDescription": "A pipeline slot represents the
>>>>>> hardware resources needed to process one uOp",
>>>>>> + "MetricName": "slot"
>>>>>
>>>>> Ehhh....I'm not sure if that is a good idea. Ian or anyone else have an
>>>>> opinion on this? It is possible to reuse metrics, so it should work, but...
>>>>>
>>>>> One problem is that "slot" would show up as a metric, which you would
>>>>> not want.
>>>>>
>>>>> Alternatively I was going to suggest that you can overwrite specific std
>>>>> arch event attributes. So for example of frontend_bound, you could have:
>>>>
>>>> I would agree with not having this and just hard coding the 5 wherever
>>>> it's needed. Once we have a few different sets of metrics in place maybe
>>>> we can start to look at deduplication, but for now I don't see the value.
>>>>
>>>>>
>>>>> + b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>>> @@ -0,0 +1,30 @@
>>>>> [
>>>>> {
>>>>> "ArchStdEvent": "FRONTEND_BOUND",
>>>>> "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 *
>>>>> cpu_cycles)",
>>>>> },
>>>>>
>>>>>> + }
>>>>>> + {
>>>>>> + "ArchStdEvent": "FRONTEND_BOUND"
>>>>>> + },
>>>>>> + {
>>>>>> + "ArchStdEvent": "BACKEND_BOUND"
>>>>>> + },
>>>>>> + {
>>>>>> + "ArchStdEvent": "WASTED"
>>>>>> + },
>>>>>> + {
>>>>>> + "ArchStdEvent": "RETIRING"
>>>>>> + },
>>>>>>
>>>>>>
>>>>>> 2. Should I add the topdownL1 metric to
>>>>>> tools/perf/pmu-event/recommended.json,
>>>>>> or create a new json file to place the general metric?
>>>>>
>>>>> It would not belong in recommended.json as that is specifically for
>>>>> arch-recommended events. It would really just depend on where the value
>>>>> comes from, i.e. arm arm or sbsa.
>>>>>
>>>>
>>>> For what we're going to publish shortly we'll be generating a
>>>> metrics.json file for each CPU. It will be autogenerated so I don't
>>>> think duplication will be an issue and I'm expecting that there will be
>>>> differences in the topdown metrics between CPUs anyway. So I would also
>>>> vote to not put it in recommended.json
>>>>
>>>
>>> I will create a new sbsa.json file in tools/perf/pmu-events/arch/arm64/
>>> to place metrics that may be common between some CPUs, just like arch_std_event.
>>
>> Because this would apply to all CPUs rather than just N2, I still think
>> it's best to wait for our metrics repo to be published. Otherwise Arm
>> will start publishing metrics with names and group names for all future
>> CPUs that have different names to the common ones added as part of this
>> change.
>>
>> It's something that we've been working on for quite a while and we've
>> taken care to make sure that it applies to future products and is scalable.
>>
>> It would be easier to add these right now only for N2, and then
>> afterwards we can start to look at what is common and could be factored
>> out into the top level folder.
>>
>>> If the topdown metrics are different in other CPUs, we can overwrite the
>>> metric expression.
>>
>> True, but with different group names and metric names and units it could
>> get slightly complicated.
>>
>>>
>>> For example:
>>>
>>> +++ b/tools/perf/pmu-events/arch/arm64/sbsa.json
>>> @@ -0,0 +1,9 @@
>>> +[
>>> + {
>>> + "MetricExpr": "stall_slot_frontend / (slot * cpu_cycles)",
>>> + "PublicDescription": "Frontend bound L1 topdown metric",
>>> + "BriefDescription": "Frontend bound L1 topdown metric",
>>> + "MetricGroup": "TopDownL1",
>>> + "MetricName": "FRONTEND_BOUND"
>>> + }
>>> +]
>>>
>>> + b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>> @@ -0,0 +1,30 @@
>>> +[
>>> + {
>>> + "ArchStdEvent": "FRONTEND_BOUND",
>>> + "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * cpu_cycles)",
>>> + }
>>> +]
>>>
>>
>> With the auto generation of metrics file I don't really see too much
>> benefit of doing it this way.
>>
>> You also run into the issue where if a platform happens to define all of
>> the events required by a metric, will that metric appear automatically,
>> even if it's not valid?
>>
>
> Ok, I agree to put the topdown metric in the n2 metric instead of arch_std_event.
> There is no unified formula for the topdown metric currently, and the slots of each
> CPU may be different.
>
> After the standard are pubulished in the future, please consider what John said, and
> use the general metric as arch_std_event.
Yep that sounds good, will do!