Re: [PATCH 02/21] perf c2c: Dump raw records, decode data_src bits

From: Don Zickus
Date: Tue Feb 18 2014 - 22:05:29 EST


On Tue, Feb 18, 2014 at 01:53:35PM +0100, Jiri Olsa wrote:
> On Mon, Feb 10, 2014 at 12:28:57PM -0500, Don Zickus wrote:
> > From: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
> >
> > From the c2c prototype:
> >
> > [root@sandy ~]# perf c2c -r report | head -7
> > T Status Pid Tid CPU Inst Adrs Virt Data Adrs Phys Data Adrs Cycles Source Decoded Source ObJect:Symbol
> > --------------------------------------------------------------------------------------------------------------------------------------------
> > raw input 779 779 7 0xffffffff810865dd 0xffff8803f4d75ec8 0 370 0x68080882 [LOAD,LCL_LLC,MISS,SNP NA] [kernel.kallsyms]:try_to_wake_up
> > raw input 779 779 7 0xffffffff8107acb3 0xffff8802a5b73158 0 297 0x6a100142 [LOAD,L1,HIT,SNP NONE,LOCKED] [kernel.kallsyms]:up_read
> > raw input 779 779 7 0x3b7e009814 0x7fff87429ea0 0 925 0x68100142 [LOAD,L1,HIT,SNP NONE] ???:???
> > raw input 0 0 1 0xffffffff8108bf81 0xffff8803eafebf50 0 172 0x68800842 [LOAD,LCL_LLC,HIT,SNP HITM] [kernel.kallsyms]:update_stats_wait_end
> > raw input 779 779 7 0x3b7e0097cc 0x7fac94b69068 0 228 0x68100242 [LOAD,LFB,HIT,SNP NONE] ???:???
> > [root@sandy ~]#
> >
> > The "Phys Data Adrs" column is not available at this point.
>
> SNIP
>
> > + sample->data_src,
> > + data_src,
> > + al->map ? (al->map->dso ? al->map->dso->long_name : "???") : "???",
> > + al->sym ? al->sym->name : "???");
> > +}
> > +
> > +static int perf_c2c__process_load_store(struct perf_c2c *c2c,
> > + struct perf_sample *sample,
> > + struct addr_location *al)
> > +{
> > + if (c2c->raw_records)
> > + perf_sample__fprintf(sample, ' ', "raw input", al, stdout);
> > +
> > return 0;
> > }
> >
> > static const struct perf_evsel_str_handler handlers[] = {
> > - { "cpu/mem-loads,ldlat=30/pp", perf_c2c__process_load, },
> > - { "cpu/mem-stores/pp", perf_c2c__process_store, },
> > + { "cpu/mem-loads,ldlat=30/pp", perf_c2c__process_load_store, },
> > + { "cpu/mem-stores/pp", perf_c2c__process_load_store, },
>
> hm.. so it's only one function for both handlers.. no need
> to use handlers at all then, right?

I implemented them seperately but then realized they look identical once
everything was working, so I combined them again. I keep thinking there
has to be some advantage to have them seperate, but haven't found a use
case.

You still need to use the handlers, in case you want to add some other events
into the mix and have them filtered out with this tool.

However, I do have the problem of trying to figure out a good way to
dynamically adjust the '30' above. Seeing that Intel doesn't publish
L1, LFB and L2 latency numbers, we have been guessing at 30 cycles for an
LLC hit. It would probably be nice to adjust that on the command line as
opposed to recompiling. Small issue.

Cheers,
Don
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/