Re: [PATCH v2 10/14] perf arm-spe: Refactor event type handling

From: André Przywara
Date: Wed Oct 21 2020 - 05:21:19 EST


On 21/10/2020 05:54, Leo Yan wrote:

Hi Leo,

> On Tue, Oct 20, 2020 at 10:54:16PM +0100, Andr� Przywara wrote:
>> On 29/09/2020 14:39, Leo Yan wrote:
>>
>> Hi,
>>
>>> Use macros instead of the enum values for event types, this is more
>>> directive and without bit shifting when parse packet.
>>>
>>> Signed-off-by: Leo Yan <leo.yan@xxxxxxxxxx>
>>> ---
>>> .../util/arm-spe-decoder/arm-spe-decoder.c | 16 +++++++-------
>>> .../util/arm-spe-decoder/arm-spe-decoder.h | 17 --------------
>>> .../arm-spe-decoder/arm-spe-pkt-decoder.c | 22 +++++++++----------
>>> .../arm-spe-decoder/arm-spe-pkt-decoder.h | 16 ++++++++++++++
>>> 4 files changed, 35 insertions(+), 36 deletions(-)
>>>
>>> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
>>> index 9d3de163d47c..ac66e7f42a58 100644
>>> --- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
>>> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
>>> @@ -168,31 +168,31 @@ static int arm_spe_read_record(struct arm_spe_decoder *decoder)
>>> case ARM_SPE_OP_TYPE:
>>> break;
>>> case ARM_SPE_EVENTS:
>>> - if (payload & BIT(EV_L1D_REFILL))
>>> + if (payload & SPE_EVT_PKT_L1D_REFILL)
>>
>> Not sure this (and the others below) are an improvement? I liked the
>> enum below, and reading BIT() here tells me that it's a bitmask.
>
> Agreed.
>
>>> decoder->record.type |= ARM_SPE_L1D_MISS;
>>>
>>> - if (payload & BIT(EV_L1D_ACCESS))
>>> + if (payload & SPE_EVT_PKT_L1D_ACCESS)
>>> decoder->record.type |= ARM_SPE_L1D_ACCESS;
>>>
>>> - if (payload & BIT(EV_TLB_WALK))
>>> + if (payload & SPE_EVT_PKT_TLB_WALK)
>>> decoder->record.type |= ARM_SPE_TLB_MISS;
>>>
>>> - if (payload & BIT(EV_TLB_ACCESS))
>>> + if (payload & SPE_EVT_PKT_TLB_ACCESS)
>>> decoder->record.type |= ARM_SPE_TLB_ACCESS;
>>>
>>> if ((idx == 2 || idx == 4 || idx == 8) &&
>>> - (payload & BIT(EV_LLC_MISS)))
>>> + (payload & SPE_EVT_PKT_LLC_MISS))
>>> decoder->record.type |= ARM_SPE_LLC_MISS;
>>>
>>> if ((idx == 2 || idx == 4 || idx == 8) &&
>>> - (payload & BIT(EV_LLC_ACCESS)))
>>> + (payload & SPE_EVT_PKT_LLC_ACCESS))
>>> decoder->record.type |= ARM_SPE_LLC_ACCESS;
>>>
>>> if ((idx == 2 || idx == 4 || idx == 8) &&
>>> - (payload & BIT(EV_REMOTE_ACCESS)))
>>> + (payload & SPE_EVT_PKT_REMOTE_ACCESS))
>>> decoder->record.type |= ARM_SPE_REMOTE_ACCESS;
>>>
>>> - if (payload & BIT(EV_MISPRED))
>>> + if (payload & SPE_EVT_PKT_MISPREDICTED)
>>> decoder->record.type |= ARM_SPE_BRANCH_MISS;
>>>
>>> break;
>>> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
>>> index a5111a8d4360..24727b8ca7ff 100644
>>> --- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
>>> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
>>> @@ -13,23 +13,6 @@
>>>
>>> #include "arm-spe-pkt-decoder.h"
>>>
>>> -enum arm_spe_events {
>>> - EV_EXCEPTION_GEN = 0,
>>> - EV_RETIRED = 1,
>>> - EV_L1D_ACCESS = 2,
>>> - EV_L1D_REFILL = 3,
>>> - EV_TLB_ACCESS = 4,
>>> - EV_TLB_WALK = 5,
>>> - EV_NOT_TAKEN = 6,
>>> - EV_MISPRED = 7,
>>> - EV_LLC_ACCESS = 8,
>>> - EV_LLC_MISS = 9,
>>> - EV_REMOTE_ACCESS = 10,
>>> - EV_ALIGNMENT = 11,
>>> - EV_PARTIAL_PREDICATE = 17,
>>> - EV_EMPTY_PREDICATE = 18,
>>> -};
>>
>> So what about keeping this, but moving it into the other header file?
>
> Will do. This is more directive, especially if we consider every bit
> indicates an event type :)
>
>> coding-style.rst says: "Enums are preferred when defining several
>> related constants."
>
> Thanks for pasting the coding style, it's useful. I agree that using
> BIT() + enum is better form, will refine the patch for this.
>
>>> -
>>> enum arm_spe_sample_type {
>>> ARM_SPE_L1D_ACCESS = 1 << 0,
>>> ARM_SPE_L1D_MISS = 1 << 1,
>>> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
>>> index ed0f4c74dfc5..b8f343320abf 100644
>>> --- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
>>> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
>>> @@ -284,58 +284,58 @@ int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
>>> if (ret < 0)
>>> return ret;
>>>
>>> - if (payload & 0x1) {
>>> + if (payload & SPE_EVT_PKT_GEN_EXCEPTION) {
>>
>> Having the bitmask here directly is indeed not very nice and error
>> prone. But I would rather see the above solution:
>> if (payload & BIT(EV_EXCEPTION_GEN)) {
>
> Will do.
>
>>> ret = arm_spe_pkt_snprintf(&buf, &blen, " EXCEPTION-GEN");
>>> if (ret < 0)
>>> return ret;
>>> }
>>> - if (payload & 0x2) {
>>> + if (payload & SPE_EVT_PKT_ARCH_RETIRED) {
>>> ret = arm_spe_pkt_snprintf(&buf, &blen, " RETIRED");
>>> if (ret < 0)
>>> return ret;
>>> }
>>> - if (payload & 0x4) {
>>> + if (payload & SPE_EVT_PKT_L1D_ACCESS) {
>>> ret = arm_spe_pkt_snprintf(&buf, &blen, " L1D-ACCESS");
>>> if (ret < 0)
>>> return ret;
>>> }
>>> - if (payload & 0x8) {
>>> + if (payload & SPE_EVT_PKT_L1D_REFILL) {
>>> ret = arm_spe_pkt_snprintf(&buf, &blen, " L1D-REFILL");
>>> if (ret < 0)
>>> return ret;
>>> }
>>> - if (payload & 0x10) {
>>> + if (payload & SPE_EVT_PKT_TLB_ACCESS) {
>>> ret = arm_spe_pkt_snprintf(&buf, &blen, " TLB-ACCESS");
>>> if (ret < 0)
>>> return ret;
>>> }
>>> - if (payload & 0x20) {
>>> + if (payload & SPE_EVT_PKT_TLB_WALK) {
>>> ret = arm_spe_pkt_snprintf(&buf, &blen, " TLB-REFILL");
>>> if (ret < 0)
>>> return ret;
>>> }
>>> - if (payload & 0x40) {
>>> + if (payload & SPE_EVT_PKT_NOT_TAKEN) {
>>> ret = arm_spe_pkt_snprintf(&buf, &blen, " NOT-TAKEN");
>>> if (ret < 0)
>>> return ret;
>>> }
>>> - if (payload & 0x80) {
>>> + if (payload & SPE_EVT_PKT_MISPREDICTED) {
>>> ret = arm_spe_pkt_snprintf(&buf, &blen, " MISPRED");
>>> if (ret < 0)
>>> return ret;
>>> }
>>> if (idx > 1) {
>>
>> Do you know what the purpose of this comparison is? Surely payload would
>> not contain more bits than would fit in "idx" bytes? So is this some
>> attempt of an optimisation?
>
> Here "idx" is for payload size (in bytes); you could see function
> arm_spe_get_events() calculate the payload size:
>
> packet->index = PAYLOAD_LEN(buf[0]);
>
> Please note, the raw payload size (field "sz" in header) value is:
>
> 0b00 Byte.
> 0b01 Halfword.
> 0b10 Word.
> 0b11 Doubleword.
>
> After using PAYLOAD_LEN(), the payload size is converted to value in
> byte, so:
>
> packet->index = 1 << "sz";
>
> 1 Byte
> 2 Halfword
> 4 Word
> 8 Doubleword
>
> In Armv8 ARM, chapter "D10.2.6 Events packet", we can see the events
> "Remote access", "Last Level cache miss" and "Last Level cache access"
> are only valid when "sz" is equal or longer than Halfword, thus idx is
> 2/4/8; this is why here checks the condition "if (idx > 1)".

Right, thanks for the explanation. But in the end this is just a lot of
words for: "You can only fit n*8 bits in n bytes.", isn't it?
So if the payload size is 1 bytes, we can't have bits 8 or higher.

And in arm_spe_get_payload() we load payload with casts, so the upper
bits, beyond the payload size, must always be 0? Regardless of what was
in the buffer. Or am I looking at the wrong function?
Even if that wouldn't be the case, I'd rather mask it here again, so
that we can rely on this, and lose the extra check.

>
>> If so, I doubt it's really useful, the
>> compiler might find a smarter solution to the problem. Just continuing
>> with the bit mask comparison would make it look nicer, I think.
>
> ARMv8 ARM gives out "Otherwise this bit reads-as-zero.", IIUC this
> suggests to firstly check the size, if cannot meet the size requirement,
> then the Event bit should be reads-as-zero.

But as mentioned above, we take care of this already:
switch (payload_len) {
case 1: packet->payload = *(uint8_t *)buf; break;
case 2: packet->payload = le16_to_cpu(*(uint16_t *)buf); break;
...

Thanks,
Andre