Re: [RFC PATCH] perf: Add load latency monitoring on Intel Nehalem/Westmere

From: Stephane Eranian
Date: Thu Dec 23 2010 - 06:05:39 EST


On Thu, Dec 23, 2010 at 11:48 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Thu, 2010-12-23 at 11:31 +0100, Stephane Eranian wrote:
>> On Thu, Dec 23, 2010 at 11:18 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>> > On Thu, 2010-12-23 at 16:59 +0800, Lin Ming wrote:
>> >> > {L1, L2, L3, RAM}x{snoop, local, remote}x{shared, exclusive} + {unknown,
>> >> > uncached, IO}
>> >> >
>> >> > Which takes all of 5 bits to encode.
>> >>
>> >> Do you mean below encoding?
>> >>
>> >> bits4 3 2 1 0
>> >> Â Â + + + + +
>> >> Â Â | | | | |
>> >> Â Â | | | {L1, L2, L3, RAM} or {unknown, uncached, IO}
>> >> Â Â | | |
>> >> Â Â | {snoop, local, remote, OTHER}
>> >> Â Â |
>> >> Â Â {shared, exclusive}
>> >>
>> >> If bits(2-3) is OTHER, then bits(0-1) is the encoding of {unknown,
>> >> uncached, IO}.
>> >
>> > That is most certainly a very valid encoding, and a rather nice one at
>> > that. I hadn't really gone further than: 4*3*2 + 3 < 2^5 :-)
>> >
>> > If you also make OTHER=0, then a valid encoding for unknown is also 0,
>> > which is a nice meaning for 0...
>> >
>> I am not sure how you would cover the 9 possibilities for data source as
>> shown in Table 10-13 using this encoding. Could you show me?
>
> Ah, I think I see the problem, there's multiple L3-snoops, I guess we
> can fix that by extending the {shared, exclusive} to full MESI, growing
> us to 6 bits.
>
> I'm assuming you mean "Table 30-13. Data Source Encoding for Load
> Latency Record", which has 14 values defined.
>
Yes.

> Value  Intel              Perf
> 0x0 Â Â Unknown L3 Â Â Â Â Â Â Â Â Â Â ÂUnknown
>
> 0x1 Â Â L1 Â Â Â Â Â Â Â Â Â Â Â Â Â Â ÂL1-local
>
> 0x2 Â Â Pending core cache HIT Â Â Â Â ÂL2-snoop
> Â Â Â ÂOutstanding core cache miss to

Not clear how you know this is snoop or L2?

I suspect this one is saying you have a request for a line
for which there is already a pending request underway. Could
be the first came from prefetchers, the 2nd is actual demand.

Let me check with Intel. The table is unclear.

> Â Â Â Âthe same line was underway
> 0x3 Â Â L2 Â Â Â Â Â Â Â Â Â Â Â Â Â Â ÂL2-local
>
> 0x4 Â Â L3-snoop, no coherency actions ÂL3-snoop-I

I am not sure I understand what you mean by local vs. remote
in your terminology.


> 0x5 Â Â L3-snoop, found no M Â Â Â Â Â ÂL3-snoop-S
> 0x6 Â Â L3-snoop, found M Â Â Â Â Â Â Â L3-snoop-M
>
> 0x8   L3-miss, snoop, shared     ÂRAM-snoop-S
> 0xA   L3-miss, local, shared     ÂRAM-local-S
> 0xB   L3-miss, remote, shared     RAM-remote-S
>
> 0xC   L3-miss, local, exclusive    RAM-local-E
> 0xD   L3-miss, remote, exclusive   ÂRAM-remote-E
>
> 0xE Â Â IO Â Â Â Â Â Â Â Â Â Â Â Â Â Â ÂIO
> 0xF   uncached            Âuncached
>
>
> Leaving us with:
>
> {L1, L2, L3, RAM}x{snoop, local, remote}x{modified, exclusive, shared, invalid} + {unknown, uncached, IO}
>
> Now the question is, is this sufficient to map all data sources from
> other archs as well?
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/