Re: [PATCH] perf arm-spe: Use SPE data source for neoverse cores

From: Ali Saidi
Date: Mon Jan 24 2022 - 18:55:31 EST


On 1/24/22, 11:24 AM, "James Clark" <james.clark@xxxxxxx> wrote:
>On 21/01/2022 18:24, Ali Saidi wrote:
>> When synthesizing data from SPE, augment the type with source information
>> for Arm Neoverse cores. The field is IMPLDEF but the Neoverse cores all use
>> the same encoding. I can't find encoding information for any other SPE
>> implementations to unify their choices with Arm's thus that is left for future
>> work.
>>
>> This changes enables the expected behavior of perf c2c on a system with SPE where
>> lines that are shared among multiple cores show up in perf c2c output.
>>
>> Signed-off-by: Ali Saidi <alisaidi@xxxxxxxxxx>
>> ---
>> .../util/arm-spe-decoder/arm-spe-decoder.c | 1 +
>> .../util/arm-spe-decoder/arm-spe-decoder.h | 12 +++++
>> tools/perf/util/arm-spe.c | 48 ++++++++++++++-----
>> 3 files changed, 49 insertions(+), 12 deletions(-)
>>
>[...]
>> +static u64 arm_spe__synth_data_source(const struct arm_spe_record *record, u64 midr)
>> {
>> union perf_mem_data_src data_src = { 0 };
>> + bool is_neoverse = is_midr_in_range(midr, neoverse_spe);
>>
>> if (record->op == ARM_SPE_LD)
>> data_src.mem_op = PERF_MEM_OP_LOAD;
>> @@ -409,19 +418,30 @@ static u64 arm_spe__synth_data_source(const struct arm_spe_record *record)
>> data_src.mem_op = PERF_MEM_OP_STORE;
>>
>> if (record->type & (ARM_SPE_LLC_ACCESS | ARM_SPE_LLC_MISS)) {
>> - data_src.mem_lvl = PERF_MEM_LVL_L3;
>> + if (is_neoverse && record->source == ARM_SPE_NV_DRAM) {
>> + data_src.mem_lvl = PERF_MEM_LVL_LOC_RAM | PERF_MEM_LVL_HIT;
>> + } else if (is_neoverse && record->source == ARM_SPE_NV_PEER_CLSTR) {
>> + data_src.mem_snoop = PERF_MEM_SNOOP_HITM;
>
>I'm not following how LLC_ACCESS | LLC_MISS ends up as HITM in this case (ARM_SPE_NV_PEER_CLSTR)?
>I thought there was no way to determine a HITM from SPE. Wouldn't one of the other values
>like PERF_MEM_SNOOP_MISS be more accurate?

Thanks for taking a look James.

I'd really like someone familiar with perf c2c output to also end up getting
similar output when running on an Arm system with SPE. There are obviously large
micro-architectural differences that have been abstracted away by the data_src
abstraction but fundamentally my understanding of x86 HITM is that the line
was found in the snoop filter of the LLC as being owned by another core and
therefore the request needs to go to another core to get the line. I'm not
100% sure if on x86 it's really guaranteed to be dirty or not and it's not
always going to be dirty in a Neoverse system, but since the SPE source
indicates it was sourced from another core it is a core-2-core transfer of a
line which is currently owned by another cpu core and that is the interesting
data point that would be used to drive optimization and elimination of frequent
core-2-core transfers (true or false sharing).

>> + data_src.mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_HIT;
>
>This one also adds PERF_MEM_LVL_HIT even though the check of "if (record->type & ARM_SPE_LLC_MISS)"
>hasn't happened yet. Maybe some comments would make it a bit clearer, but at the moment it's
>not obvious how the result is derived because there are some things that don't add up like
>ARM_SPE_LLC_MISS == PERF_MEM_LVL_HIT.

Assuming the above is correct, my reading of the existing code that creates the
c2c output is that when an access is marked as an LLC hit, that doesn't
necessarily mean that the data was present in the LLC. I don't see how it could
given that LLC_HIT + HITM means the line was dirty in another CPUs cache, and so
LLC_HIT + HITM seems to mean that it was a hit in the LLC snoop filter and
required a different core to provide the line. This and the above certainly
deserve a comment as to why the miss is being attributed in this way if it's
otherwise acceptable.

Thanks,
Ali