Re: [BUG REPORT] perf tools: x86_64: weight column displays odd memory latency data

From: Liang, Kan
Date: Fri Jun 03 2022 - 10:03:42 EST




On 6/1/2022 1:03 AM, Tyson Thomas wrote:
Hi Kan,

Thanks for the reply.

The exact command used was: perf mem report

The columns are in order: Overhead, Samples, Local Weight, Memory Access.
The results I sent before are: Overhead, Local Weight and Memory.

So, to update the table from before, each tuple has a sample of 1, so it is:

0.02%,1 ,62515 ,L1 or L1 hit
0.02%,1 ,54048 ,L1 or L1 hit
0.02%,1 ,52206 ,L1 or L1 hit
0.02%,1 ,49831 ,L1 or L1 hit
0.02%,1 ,49056 ,Local RAM or RAM hit
0.01%,1 ,40666 ,LFB or LFB hit
0.01%,1 ,38080 ,L1 or L1 hit
0.01%,1 ,36772 ,L1 or L1 hit

So, the weights are absurdly high here.

I'll give a reply on perf report -D shortly.

Regards,
Tyson



From: Liang, Kan <kan.liang@xxxxxxxxxxxxxxx>
Sent: Wednesday, 25 May 2022 12:58 AM
To: Tyson Thomas <tyson.thomas@xxxxxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx <linux-kernel@xxxxxxxxxxxxxxx>; linux-perf-users@xxxxxxxxxxxxxxx <linux-perf-users@xxxxxxxxxxxxxxx>
Subject: Re: [BUG REPORT] perf tools: x86_64: weight column displays odd memory latency data


On 5/22/2022 6:14 PM, Tyson Thomas wrote:
Hi Kan, Linux-Perf Team

I have observed some odd behaviour within perf when using perf-mem. Specifically the reported latency under the weight column appears to be unreasonably high.

Here is a given sample from a recent test, I find that some of the  latencies are close to an unsigned short and I cannot seem to be understand why that would be outside of it being an issue with the perf events.

This can be replicated using a NAS benchmark, specifically cg.D.


I tried the cg.D benchmark on my machine, but I cannot reproduce the issue. The weight looks reasonable.

$ perf mem record ./bin/cg.D.x


NAS Parallel Benchmarks (NPB3.4-OMP) - CG Benchmark

Size: 1500000
Iterations: 100
Number of available threads: 4

^C[ perf record: Woken up 118 times to write data ]
[ perf record: Captured and wrote 30.108 MB perf.data (438307 samples) ]

$ perf mem report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 209K of event 'cpu/mem-loads,ldlat=30/Pu'
# Total weight : 56835744
# Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked,local_ins_lat,p_stage_cyc
#
# Overhead Samples Local Weight Memory access Symbol Shared Object Data Symbol >
# ........ ............ ............ ........................ ................................ .................... ............>
#
0.01% 66 53 L1 or L1 hit [.] randlc_ cg.D.x [.] 0x00007f>
0.01% 53 58 L1 or L1 hit [.] randlc_ cg.D.x [.] 0x00007f>
0.01% 59 52 L1 or L1 hit [.] randlc_ cg.D.x [.] 0x00007f>
0.00% 53 53 L1 or L1 hit [.] randlc_ cg.D.x [.] 0x00007f>
0.00% 43 65 L1 or L1 hit [.] randlc_ cg.D.x [.] 0x00007f>
0.00% 50 55 L1 or L1 hit [.] randlc_ cg.D.x [.] 0x00007f>
0.00% 51 50 L1 or L1 hit [.] randlc_ cg.D.x [.] 0x00007f>
0.00% 46 55 L1 or L1 hit [.] randlc_ cg.D.x [.] 0x00007f>
0.00% 49 51 L1 or L1 hit [.] randlc_ cg.D.x [.] 0x00007f>
0.00% 1 2496 LFB or LFB hit [.] sparse_ cg.D.x [.] 0x00007f>
0.00% 48 52 L1 or L1 hit [.] randlc_ cg.D.x [.] 0x00007f>
0.00% 1 2441 Local RAM or RAM hit [.] sparse_ cg.D.x [.] 0x00007f>
0.00% 1 2420 LFB or LFB hit [.] sparse_ cg.D.x [.] 0x00007f>
0.00% 1 2415 LFB or LFB hit [.] sparse_ cg.D.x [.] 0x00007f>
0.00% 1 2403 LFB or LFB hit [.] sparse_ cg.D.x [.] 0x00007f>
0.00% 1 2399 LFB or LFB hit [.] sparse_ cg.D.x [.] 0x00007f>
0.00% 1 2390 LFB or LFB hit [.] sparse_ cg.D.x [.] 0x00007f>
0.00% 1 2385 LFB or LFB hit [.] sparse_ cg.D.x [.] 0x00007f>
0.00% 1 2385 LFB or LFB hit [.] sparse_ cg.D.x [.] 0x00007f>
0.00% 1 2378 LFB or LFB hit [.] sparse_ cg.D.x [.] 0x00007f>
0.00% 1 2358 LFB or LFB hit [.] sparse_ cg.D.x [.] 0x00007f>

Thanks,
Kan


I observe the following results in perf mem report (just getting the top 10 results)



Could you please show me the exact perf command used?

With my perf mem report,
The first column is the Overhead.
The second column is the number of samples.
The third column is the weight.
The fourth is the Memory access.

Seems like the weight is missed?

Could you please check the perf report -D?
It will dump the weight for each Sample.
Does it look correct?

Thanks,
Kan

   0.02% ,62515 ,L1 or L1 hit
   0.02% ,54048 ,L1 or L1 hit
   0.02% ,52206 ,L1 or L1 hit
   0.02% ,49831 ,L1 or L1 hit
   0.02% ,49056 ,Local RAM or RAM hit
   0.01% ,40666 ,LFB or LFB hit
   0.01% ,38080 ,L1 or L1 hit
   0.01% ,36772 ,L1 or L1 hit
   0.01% ,36729 ,LFB or LFB hit
   0.01% ,27101 ,LFB or LFB hit

Is it possible for someone to shed some light on this or am I misunderstanding how the weight column is used here?
This appears to have been an issue on 5.4, 5.10 and 5.15. I am looking into seeing if it is still present in 5.17 and 5.18.

I've also tried this on different Intel CPUs such as Intel Xeon 6230, i5-1135G7, Intel Xeon 6330

Any insight or help would be appreciated,
Tyson