Re: [PATCH 5/5] perf: arm_spe: Decode SVE events

From: André Przywara
Date: Mon Sep 28 2020 - 10:00:18 EST


On 28/09/2020 14:21, Dave Martin wrote:

Hi Dave,

> On Tue, Sep 22, 2020 at 11:12:25AM +0100, Andre Przywara wrote:
>> The Scalable Vector Extension (SVE) is an ARMv8 architecture extension
>> that introduces very long vector operations (up to 2048 bits).
>
> (8192, in fact, though don't expect to see that on real hardware any
> time soon... qemu and the Arm fast model can do it, though.)
>
>> The SPE profiling feature can tag SVE instructions with additional
>> properties like predication or the effective vector length.
>>
>> Decode the new operation type bits in the SPE decoder to allow the perf
>> tool to correctly report about SVE instructions.
>
>
> I don't know anything about SPE, so just commenting on a few minor
> things that catch my eye here.

Many thanks for taking a look!
Please note that I actually missed a prior submission by Wei, so the
code changes here will end up in:
https://lore.kernel.org/patchwork/patch/1288413/

But your two points below magically apply to his patch as well, so....

>
>> Signed-off-by: Andre Przywara <andre.przywara@xxxxxxx>
>> ---
>> .../arm-spe-decoder/arm-spe-pkt-decoder.c | 48 ++++++++++++++++++-
>> 1 file changed, 47 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
>> index a033f34846a6..f0c369259554 100644
>> --- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
>> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
>> @@ -372,8 +372,35 @@ int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
>> }
>> case ARM_SPE_OP_TYPE:
>> switch (idx) {
>> - case 0: return snprintf(buf, buf_len, "%s", payload & 0x1 ?
>> + case 0: {
>> + size_t blen = buf_len;
>> +
>> + if ((payload & 0x89) == 0x08) {
>> + ret = snprintf(buf, buf_len, "SVE");
>> + buf += ret;
>> + blen -= ret;
>
> (Nit: can ret be < 0 ? I've never been 100% clear on this myself for
> the s*printf() family -- if this assumption is widespread in perf tool
> a lready that I guess just go with the flow.)

Yeah, some parts of the code in here check for -1, actually, but doing
this on every call to snprintf would push this current code over the
edge - and I cowardly avoided a refactoring ;-)

Please note that his is perf userland, and also we are printing constant
strings here.
Although admittedly this starts to sounds like an excuse now ...

> I wonder if this snprintf+increment+decrement sequence could be wrapped
> up as a helper, rather than having to be repeated all over the place.

Yes, I was hoping nobody would notice ;-)

>> + if (payload & 0x2)
>> + ret = snprintf(buf, buf_len, " FP");
>> + else
>> + ret = snprintf(buf, buf_len, " INT");
>> + buf += ret;
>> + blen -= ret;
>> + if (payload & 0x4) {
>> + ret = snprintf(buf, buf_len, " PRED");
>> + buf += ret;
>> + blen -= ret;
>> + }
>> + /* Bits [7..4] encode the vector length */
>> + ret = snprintf(buf, buf_len, " EVLEN%d",
>> + 32 << ((payload >> 4) & 0x7));
>
> Isn't this just extracting 3 bits (0x7)?

Ah, right, the comment is wrong. It's actually bits [6:4].

> And what unit are we aiming
> for here: is it the number of bytes per vector, or something else? I'm
> confused by the fact that this will go up in steps of 32, which doesn't
> seem to match up to the architecure.

So this is how SPE encodes the effective vector length in its payload:
the format is described in section "D10.2.7 Operation Type packet" in a
(recent) ARMv8 ARM. I put the above statement in a C file and ran all
input values through it, it produced the exact *bit* length values as in
the spec.

Is there any particular pattern you are concerned about?
I admit this is somewhat hackish, I can do an extra function to put some
comments in there.

>
> I notice that bit 7 has to be zero to get into this if() though.
>
>> + buf += ret;
>> + blen -= ret;
>> + return buf_len - blen;
>> + }
>> +
>> + return snprintf(buf, buf_len, "%s", payload & 0x1 ?
>> "COND-SELECT" : "INSN-OTHER");
>> + }
>> case 1: {
>> size_t blen = buf_len;
>>
>> @@ -403,6 +430,25 @@ int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
>> ret = snprintf(buf, buf_len, " NV-SYSREG");
>> buf += ret;
>> blen -= ret;
>> + } else if ((payload & 0x0a) == 0x08) {
>> + ret = snprintf(buf, buf_len, " SVE");
>> + buf += ret;
>> + blen -= ret;
>> + if (payload & 0x4) {
>> + ret = snprintf(buf, buf_len, " PRED");
>> + buf += ret;
>> + blen -= ret;
>> + }
>> + if (payload & 0x80) {
>> + ret = snprintf(buf, buf_len, " SG");
>> + buf += ret;
>> + blen -= ret;
>> + }
>> + /* Bits [7..4] encode the vector length */
>> + ret = snprintf(buf, buf_len, " EVLEN%d",
>> + 32 << ((payload >> 4) & 0x7));
>
> Same comment as above. Maybe have a common helper for decoding the
> vector length bits so it can be fixed in a single place?

Yup. Although I wonder if this is the smallest of the problems with this
function going forward.

Cheers,
Andre

>
>> + buf += ret;
>> + blen -= ret;
>> } else if (payload & 0x4) {
>> ret = snprintf(buf, buf_len, " SIMD-FP");
>> buf += ret;
>
> Cheers
> ---Dave
>