Re: [Questions] perf c2c: What's the current status of perf c2c?

From: Peter Zijlstra
Date: Wed Dec 09 2015 - 04:34:16 EST

Next message: Maxime Coquelin: "Re: [PATCH v3 2/9] Documentation: dt-bindings: Document STM32 pinctrl driver DT bindings"
Previous message: Vladimir Davydov: "Re: [PATCH 6/8] mm: memcontrol: move kmem accounting code to CONFIG_MEMCG"
In reply to: Jiri Olsa: "Re: [Questions] perf c2c: What's the current status of perf c2c?"
Next in thread: Peter Zijlstra: "Re: [Questions] perf c2c: What's the current status of perf c2c?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, Dec 09, 2015 at 09:04:40AM +0100, Jiri Olsa wrote:
> On Wed, Dec 09, 2015 at 12:06:44PM +0800, Yunlong Song wrote:
> > Hi, Don,
> > I am interested in the perf c2c tool, which is introduced in: http://lwn.net/Articles/588866/
> > However, I found that this tool has not been applied to the mainline tree of perf, Why? It was first
> > introduced in Feb. 2014. What's its current status now? Does it have a new version or a repository
> > somewhere else? And does it support Haswell?
>
> hi,
> not sure Don made any progress on this field, but I'm having
> his c2c sources rebased current perf sources ATM.
>
> I changed the tool a little to run over new DATALA events
> added in Haswell (in addition to ldlat events) and it seems
> to work.
>
> the plan for me is to to use it some more to prove it's useful
> and kick it to be merged with perf at some point

So I never really liked the c2c tool because it was so narrowly
focussed, it only works on NUMA thingies IIRC.

I would much rather see a tool that uses PEBS events and does a dwarf
decode of the exact instruction's data reference -- without relying on
data address bits.

That is; suppose we measure LLC_MISS, even if we have a
data-address, as soon as its inside a dynamically allocated object,
you're lost.

However, since we have the exact instruction we can simply look at that.
Imagine something like:

struct foo {
int blah;
int val;
int array[];
};

struct bar {
struct foo *foo;
}

int foobar(struct bar *bar)
{
return bar->foo->val;
}

Which we can imagine could result in code like:

foobar:
mov (%rax), %rax # load bar::foo
mov (%rax,1,4), %rax # load foo::val

And DWARFs should know this, so by knowing the instruction we can know
which load missed the cache.

Once you have this information, you can use pahole like structure output
and heat colour them or whatnot. Bright red if you miss lots etc..

Now currently this is possible but a bit of work because the DWARF
annotations are not exactly following these data types, that is you
might need to decode previous instructions and infer some bits.

I think Stephane was working with GCC people to allow more/better DWARF
annotations and allow easier retrieval of this information.

Note: the proposed scheme still have some holes in, imagine trying to
load an array[] member like:

mov 8(%rax, %rcx, 4), %rcx

This would load the array element indexed by RCX into RCX, thereby
destroying the index. In this case knowing the data address you can
still compute the index if you also know RAX (which you get from the
PEBS register dump).

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Maxime Coquelin: "Re: [PATCH v3 2/9] Documentation: dt-bindings: Document STM32 pinctrl driver DT bindings"
Previous message: Vladimir Davydov: "Re: [PATCH 6/8] mm: memcontrol: move kmem accounting code to CONFIG_MEMCG"
In reply to: Jiri Olsa: "Re: [Questions] perf c2c: What's the current status of perf c2c?"
Next in thread: Peter Zijlstra: "Re: [Questions] perf c2c: What's the current status of perf c2c?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]