Re: [RFT v3 1/4] perf cs-etm: Generate branch sample for missed packets

From: Mike Leach
Date: Wed May 30 2018 - 11:04:41 EST


Hi Leo,



On 30 May 2018 at 10:45, Robert Walker <robert.walker@xxxxxxx> wrote:
>
>
> On 28/05/18 23:13, Mathieu Poirier wrote:
>>
>> Leo and/or Robert,
>>
>> On Mon, May 28, 2018 at 04:45:00PM +0800, Leo Yan wrote:
>>>
>>> Commit e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight
>>> traces") reworks the samples generation flow from CoreSight trace to
>>> match the correct format so Perf report tool can display the samples
>>> properly.
>>>
>>> But the change has side effect for branch packet handling, it only
>>> generate branch samples by checking previous packet flag
>>> 'last_instr_taken_branch' is true, this results in below three kinds
>>> packets are missed to generate branch samples:
>>>
>>> - The start tracing packet at the beginning of tracing data;
>>> - The exception handling packet;
>>> - If one CS_ETM_TRACE_ON packet is inserted, we also miss to handle it
>>> for branch samples. CS_ETM_TRACE_ON packet itself can give the info
>>> that there have a discontinuity in the trace, on the other hand we
>>> also miss to generate proper branch sample for packets before and
>>> after CS_ETM_TRACE_ON packet.
>>>
>>> This patch is to add branch sample handling for up three kinds packets:
>>>
>>> - In function cs_etm__sample(), check if 'prev_packet->sample_type' is
>>> zero and in this case it generates branch sample for the start tracing
>>> packet; furthermore, we also need to handle the condition for
>>> prev_packet::end_addr is zero in the cs_etm__last_executed_instr();
>>>
>>> - In function cs_etm__sample(), check if 'prev_packet->exc' is true and
>>> generate branch sample for exception handling packet;
>>>
>>> - If there has one CS_ETM_TRACE_ON packet is coming, we firstly generate
>>> branch sample in the function cs_etm__flush(), this can save complete
>>> info for the previous CS_ETM_RANGE packet just before CS_ETM_TRACE_ON
>>> packet. We also generate branch sample for the new CS_ETM_RANGE
>>> packet after CS_ETM_TRACE_ON packet, this have two purposes, the
>>> first one purpose is to save the info for the new CS_ETM_RANGE packet,
>>> the second purpose is to save CS_ETM_TRACE_ON packet info so we can
>>> have hint for a discontinuity in the trace.
>>>
>>> For CS_ETM_TRACE_ON packet, its fields 'packet->start_addr' and
>>> 'packet->end_addr' equal to 0xdeadbeefdeadbeefUL which are emitted in
>>> the decoder layer as dummy value. This patch is to convert these
>>> values to zeros for more readable; this is accomplished by functions
>>> cs_etm__last_executed_instr() and cs_etm__first_executed_instr(). The
>>> later one is a new function introduced by this patch.
>>>
>>> Reviewed-by: Robert Walker <robert.walker@xxxxxxx>
>>> Fixes: e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight
>>> traces")
>>> Signed-off-by: Leo Yan <leo.yan@xxxxxxxxxx>
>>> ---
>>> tools/perf/util/cs-etm.c | 93
>>> +++++++++++++++++++++++++++++++++++++-----------
>>> 1 file changed, 73 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
>>> index 822ba91..8418173 100644
>>> --- a/tools/perf/util/cs-etm.c
>>> +++ b/tools/perf/util/cs-etm.c
>>> @@ -495,6 +495,20 @@ static inline void
>>> cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq)
>>> static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet
>>> *packet)
>>> {
>>> /*
>>> + * The packet is the start tracing packet if the end_addr is
>>> zero,
>>> + * returns 0 for this case.
>>> + */
>>> + if (!packet->end_addr)
>>> + return 0;
>>
>>
>> What is considered to be the "start tracing packet"? Right now the only
>> two
>> kind of packets inserted in the decoder packet buffer queue are INST_RANGE
>> and
>> TRACE_ON. How can we hit a condition where packet->end-addr == 0?
>>
>>
>>> +
>>> + /*
>>> + * The packet is the CS_ETM_TRACE_ON packet if the end_addr is
>>> + * magic number 0xdeadbeefdeadbeefUL, returns 0 for this case.
>>> + */
>>> + if (packet->end_addr == 0xdeadbeefdeadbeefUL)
>>> + return 0;
>>
>>
>> As it is with the above, I find triggering on addresses to be brittle and
>> hard
>> to maintain on the long run. Packets all have a sample_type field that
>> should
>> be used in cases like this one. That way we know exactly the condition
>> that is
>> targeted.
>>
>> While working on this set, please spin-off another patch that defines
>> CS_ETM_INVAL_ADDR 0xdeadbeefdeadbeefUL and replace all the cases where the
>> numeral is used. That way we stop using the hard coded value.
>>
>>> +
>>> + /*
>>> * The packet records the execution range with an exclusive end
>>> address
>>> *
>>> * A64 instructions are constant size, so the last executed
>>> @@ -505,6 +519,18 @@ static inline u64 cs_etm__last_executed_instr(struct
>>> cs_etm_packet *packet)
>>> return packet->end_addr - A64_INSTR_SIZE;
>>> }
>>> +static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet
>>> *packet)
>>> +{
>>> + /*
>>> + * The packet is the CS_ETM_TRACE_ON packet if the start_addr is
>>> + * magic number 0xdeadbeefdeadbeefUL, returns 0 for this case.
>>> + */
>>> + if (packet->start_addr == 0xdeadbeefdeadbeefUL)
>>> + return 0;
>>
>>
>> Same comment as above.
>>
>>> +
>>> + return packet->start_addr;
>>> +}
>>> +
>>> static inline u64 cs_etm__instr_count(const struct cs_etm_packet
>>> *packet)
>>> {
>>> /*
>>> @@ -546,7 +572,7 @@ static void cs_etm__update_last_branch_rb(struct
>>> cs_etm_queue *etmq)
>>> be = &bs->entries[etmq->last_branch_pos];
>>> be->from = cs_etm__last_executed_instr(etmq->prev_packet);
>>> - be->to = etmq->packet->start_addr;
>>> + be->to = cs_etm__first_executed_instr(etmq->packet);
>>> /* No support for mispredict */
>>> be->flags.mispred = 0;
>>> be->flags.predicted = 1;
>>> @@ -701,7 +727,7 @@ static int cs_etm__synth_branch_sample(struct
>>> cs_etm_queue *etmq)
>>> sample.ip = cs_etm__last_executed_instr(etmq->prev_packet);
>>> sample.pid = etmq->pid;
>>> sample.tid = etmq->tid;
>>> - sample.addr = etmq->packet->start_addr;
>>> + sample.addr = cs_etm__first_executed_instr(etmq->packet);
>>> sample.id = etmq->etm->branches_id;
>>> sample.stream_id = etmq->etm->branches_id;
>>> sample.period = 1;
>>> @@ -897,13 +923,28 @@ static int cs_etm__sample(struct cs_etm_queue
>>> *etmq)
>>> etmq->period_instructions = instrs_over;
>>> }
>>> - if (etm->sample_branches &&
>>> - etmq->prev_packet &&
>>> - etmq->prev_packet->sample_type == CS_ETM_RANGE &&
>>> - etmq->prev_packet->last_instr_taken_branch) {
>>> - ret = cs_etm__synth_branch_sample(etmq);
>>> - if (ret)
>>> - return ret;
>>> + if (etm->sample_branches && etmq->prev_packet) {
>>> + bool generate_sample = false;
>>> +
>>> + /* Generate sample for start tracing packet */
>>> + if (etmq->prev_packet->sample_type == 0 ||
>>
>>
>> What kind of packet is sample_type == 0 ?
>>
>>> + etmq->prev_packet->sample_type == CS_ETM_TRACE_ON)
>>> + generate_sample = true;
>>> +
>>> + /* Generate sample for exception packet */
>>> + if (etmq->prev_packet->exc == true)
>>> + generate_sample = true;
>>
>>
>> Please don't do that. Exception packets have a type of their own and can
>> be
>> added to the decoder packet queue the same way INST_RANGE and TRACE_ON
>> packets
>> are. Moreover exception packet containt an address that, if I'm reading
>> the
>> documenation properly, can be used to keep track of instructions that were
>> executed between the last address of the previous range packet and the
>> address
>> executed just before the exception occurred. Mike and Rob will have to
>> confirm
>> this as the decoder may be doing all that hard work for us.
>>

clarification on the exception packets....

The Opencsd output exception packet gives you the exception number,
and optionally the preferred return address. If this address is
present does depend a lot on the underlying protocol - will normally
be there with ETMv4.
Exceptions are marked differently in the underlying protocol - the
OCSD packets abstract away these differences.

consider the code:

0x1000: <some instructions>
0x1100: BR 0x2000
....
0x2000: <some instructions>
0x2020 BZ r4

Without an exception this would result in the packets

OCSD_RANGE(0x1000,0x1104, Last instr type=Br, taken) // recall that
range packets have start addr inclusive, end addr exclusive.
OCSD_RANGE(0x2000,0x2024, Last instr type=Br, <taken / not taken -
depends on condition>

Now consider an exception occurring before the BR 0x2000

this will result in:-
OCSD_RANGE(0x1000, 0x1100, Last instr type=Other)
OCSD_EXECEPTION(IRQ, ret-addr 0x1100)
OCSD_RANGE(IRQ_START, IRQ_END+4, Last instr type = BR, taken) //
this is more likely to have multiple ranges / branches before any
return, but simplified here.
OCSD_EXCEPTION_RETURN() // present if exception returns are
explicitly marked in underlying trace - may not always be depending on
circumstances.
OCSD_RANGE(0x1100,0x1104, Last=BR, taken) // continue on with short
range - just the branch
OCSD_RANGE(0x2000,0x2024, Last instr type=Br, <taken / not taken -
depends on condition>

Now consider the exception occurring after the BR, but before any
other instructions are executed.

OCSD_RANGE(0x1000,0x1104, Last instr type=Br, taken) // recall that
range packets have start addr inclusive, end addr exclusive.
OCSD_EXECEPTION(IRQ, ret-addr 0x2000) // here the preferred return
address is actually the target of the branch.
OCSD_RANGE(IRQ_START, IRQ_END+4, Last instr type = BR, taken) //
this is more likely to have multiple ranges / branches before any
return, but simplified here.
OCSD_RANGE(0x2000,0x2024, Last instr type=Br, <taken / not taken -
depends on condition>

So in general it is possible to arrive in the IRQ_START range with the
previous packet having been either a taken branch, a not taken branch,
or not a branch.
Care must be taken - whether AutoFDO or normal trace disassembly not
to assume that having the last range packet as a taken branch means
that the next range packet is the target, if there is an intervening
exception packet.

Regards

Mike

>>> +
>>> + /* Generate sample for normal branch packet */
>>> + if (etmq->prev_packet->sample_type == CS_ETM_RANGE &&
>>> + etmq->prev_packet->last_instr_taken_branch)
>>> + generate_sample = true;
>>> +
>>> + if (generate_sample) {
>>> + ret = cs_etm__synth_branch_sample(etmq);
>>> + if (ret)
>>> + return ret;
>>> + }
>>> }
>>> if (etm->sample_branches || etm->synth_opts.last_branch) {
>>> @@ -922,11 +963,16 @@ static int cs_etm__sample(struct cs_etm_queue
>>> *etmq)
>>> static int cs_etm__flush(struct cs_etm_queue *etmq)
>>> {
>>> int err = 0;
>>> + struct cs_etm_auxtrace *etm = etmq->etm;
>>> struct cs_etm_packet *tmp;
>>> - if (etmq->etm->synth_opts.last_branch &&
>>> - etmq->prev_packet &&
>>> - etmq->prev_packet->sample_type == CS_ETM_RANGE) {
>>> + if (!etmq->prev_packet)
>>> + return 0;
>>> +
>>> + if (etmq->prev_packet->sample_type != CS_ETM_RANGE)
>>> + return 0;
>>> +
>>> + if (etmq->etm->synth_opts.last_branch) {
>>
>>
>> If you add:
>>
>> if (!etmq->etm->synth_opts.last_branch)
>> return 0;
>>
>> You can avoid indenting the whole block.
>>
>>> /*
>>> * Generate a last branch event for the branches left in
>>> the
>>> * circular buffer at the end of the trace.
>>> @@ -939,18 +985,25 @@ static int cs_etm__flush(struct cs_etm_queue *etmq)
>>> err = cs_etm__synth_instruction_sample(
>>> etmq, addr,
>>> etmq->period_instructions);
>>> + if (err)
>>> + return err;
>>> etmq->period_instructions = 0;
>>> + }
>>> - /*
>>> - * Swap PACKET with PREV_PACKET: PACKET becomes
>>> PREV_PACKET for
>>> - * the next incoming packet.
>>> - */
>>> - tmp = etmq->packet;
>>> - etmq->packet = etmq->prev_packet;
>>> - etmq->prev_packet = tmp;
>>> + if (etm->sample_branches) {
>>> + err = cs_etm__synth_branch_sample(etmq);
>>> + if (err)
>>> + return err;
>>> }
>>> - return err;
>>> + /*
>>> + * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
>>> + * the next incoming packet.
>>> + */
>>> + tmp = etmq->packet;
>>> + etmq->packet = etmq->prev_packet;
>>> + etmq->prev_packet = tmp;
>>
>>
>> Robert, I remember noticing that when you first submitted the code but
>> forgot to
>> go back to it. What is the point of swapping the packets? I understand
>>
>> etmq->prev_packet = etmq->packet;
>>
>> But not
>>
>> etmq->packet = tmp;
>>
>> After all etmq->packet will be clobbered as soon as
>> cs_etm_decoder__get_packet()
>> is called, which is alwasy right after either cs_etm__sample() or
>> cs_etm__flush().
>>
>
> This is code I inherited from the original versions of these patches, but it
> works because:
> - etmq->packet and etmq->prev_packet are pointers to struct cs_etm_packet
> allocated by zalloc() in cs_etm__alloc_queue()
> - cs_etm_decoder__get_packet() takes a pointer to struct cs_etm_packet and
> copies the contents of the first packet from the queue into the passed
> location with:
> *packet = decoder->packet_buffer[decoder->head]
>
> So the swap code is only swapping the pointers over, not the contents of the
> packets.
>
> Regards
>
> Rob
>
>
>
>> Thanks,
>> Mathieu
>>
>>
>>
>>> + return 0;
>>> }
>>> static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
>>> --
>>> 2.7.4
>>>
>



--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK