Re: [PATCH 6/9] perf report: Support instruction latency

From: Liang, Kan
Date: Mon Feb 08 2021 - 08:52:22 EST




On 2/6/2021 3:09 AM, Namhyung Kim wrote:
On Fri, Feb 5, 2021 at 11:38 PM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx> wrote:

On 2/5/2021 6:08 AM, Namhyung Kim wrote:
On Wed, Feb 3, 2021 at 5:14 AM <kan.liang@xxxxxxxxxxxxxxx> wrote:

From: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>

The instruction latency information can be recorded on some platforms,
e.g., the Intel Sapphire Rapids server. With both memory latency
(weight) and the new instruction latency information, users can easily
locate the expensive load instructions, and also understand the time
spent in different stages. The users can optimize their applications
in different pipeline stages.

The 'weight' field is shared among different architectures. Reusing the
'weight' field may impacts other architectures. Add a new field to store
the instruction latency.

Like the 'weight' support, introduce a 'ins_lat' for the global
instruction latency, and a 'local_ins_lat' for the local instruction
latency version.

Could you please clarify the difference between the global latency
and the local latency?


The global means the total latency.
The local means average latency, aka total / number of samples.

Thanks for the explanation, but I think it's confusing.
Why not call it just total_latency and avg_latency?


The instruction latency field is an extension of the weight field, so I follow the same way to name the field. I still think we should make the naming consistency.

To address the confusion, I think we may update the document for both the weight and the instruction latency fields.

How about the below patch?

From d5e80f541cb7288b24a7c5661ae5faede4747807 Mon Sep 17 00:00:00 2001
From: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>
Date: Mon, 8 Feb 2021 05:27:03 -0800
Subject: [PATCH] perf documentation: Add comments to the local/global weight related fields

Current 'local' and 'global' prefix is confusing for the weight related
fields, e.g., weight, instruction latency.

Add comments to clarify.
'global' means total weight/instruction latency sum.
'local' means average weight/instruction latency per sample

Signed-off-by: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>
---
tools/perf/Documentation/perf-report.txt | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index f546b5e..acc1c1d 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -92,8 +92,9 @@ OPTIONS
- srcfile: file name of the source file of the samples. Requires dwarf
information.
- weight: Event specific weight, e.g. memory latency or transaction
- abort cost. This is the global weight.
- - local_weight: Local weight version of the weight above.
+ abort cost. This is the global weight (total weight sum).
+ - local_weight: Local weight (average weight per sample) version of the
+ weight above.
- cgroup_id: ID derived from cgroup namespace device and inode numbers.
- cgroup: cgroup pathname in the cgroupfs.
- transaction: Transaction abort flags.
@@ -110,8 +111,9 @@ OPTIONS
--time-quantum (default 100ms). Specify with overhead and before it.
- code_page_size: the code page size of sampled code address (ip)
- ins_lat: Instruction latency in core cycles. This is the global instruction
- latency
- - local_ins_lat: Local instruction latency version
+ latency (total instruction latency sum)
+ - local_ins_lat: Local instruction latency (average instruction latency per
+ sample) version

By default, comm, dso and symbol keys are used.
(i.e. --sort comm,dso,symbol)
--
2.7.4


Thanks,
Kan