Re: [PATCH 1/2] include: trace: Add SCMI header with trace events

From: Jim Quinlan
Date: Wed Dec 18 2019 - 11:37:18 EST




On 12/18/19 7:09 AM, Sudeep Holla wrote:
> On Mon, Dec 16, 2019 at 05:15:54PM -0500, Jim Quinlan wrote:
>> From: Lukasz Luba <lukasz.luba@xxxxxxx>
>>
>> +
>> +TRACE_EVENT(scmi_xfer_begin,
>> + TP_PROTO(u8 id, u8 protocol_id, u16 seq, bool poll),
>> + TP_ARGS(id, protocol_id, seq, poll),
>> +
>> + TP_STRUCT__entry(
>> + __field(u8, id)
>> + __field(u8, protocol_id)
>> + __field(u16, seq)
>> + __field(bool, poll)
>> + ),
>> +
>> + TP_fast_assign(
>> + __entry->id = id;
>> + __entry->protocol_id = protocol_id;
>> + __entry->seq = seq;
>> + __entry->poll = poll;
>> + ),
>> +
>> + TP_printk("id=%u protocol_id=%u seq=%u poll=%u", __entry->id,
>> + __entry->protocol_id, __entry->seq, __entry->poll)
>> +);
>> +
>> +TRACE_EVENT(scmi_xfer_end,
>> + TP_PROTO(u8 id, u8 protocol_id, u16 seq, u32 status),
>> + TP_ARGS(id, protocol_id, seq, status),
>> +
>> + TP_STRUCT__entry(
>> + __field(u8, id)
>> + __field(u8, protocol_id)
>> + __field(u16, seq)
>> + __field(u32, status)
>> + ),
>> +
>> + TP_fast_assign(
>> + __entry->id = id;
>> + __entry->protocol_id = protocol_id;
>> + __entry->seq = seq;
>> + __entry->status = status;
>> + ),
>> +
>> + TP_printk("id=%u protocol_id=%u seq=%u status=%u", __entry->id,
>> + __entry->protocol_id, __entry->seq, __entry->status)
>> +);
>>
>> Hello,
>>
>> When there are multiple messages in the mbox queue, I've found it
>> a chore matching up the 'begin' event with the 'end' event for each
>> SCMI msg. The id (command) may not be unique, the proto_id may not be
>> unique, and the seq may not be unique.
> I agree on id and proto_id part easily and the seq may not be unique
> if and only if the previous command has completed.
>
>> The combination of the three may not be unique.
> Not 100% sure on that. I remember one of the issue you reported where OS
> times out and platform may still be processing it. That's one of the
> case where seq id may get re-assigned, but now that's fixed and the
> scenario may not happen. I am trying to understand why you think it
> is not unique ?
If I submit a series of five clk_enable() calls via SCMI, and they are all
executed sequentially, they will most likely have the same seq-proto-cmd value
-- do you agree? Now typically one can match the begin tracepoint with the
end because they come in pairs. But when you are using four protocols andÂ
the mbox queue has more than one, it is difficult to eyeball the trace output
and have a good idea of what is going on. If one uses a post
processing script, that's another story.
>
>> Would it make sense to assign a monotonically increasing ID to each
>> msg so that one can easily match the two events for each msg?
> I am not sure if we need to maintain a tracker/counter just for trace
> purposes.
I was just suggesting this for consideration -- if you deem it not helpful,
or cannot demonstrate its usefulness, by all means do not add it.
I have an alternative method for logging SCMI events that I prefer
(see below as to why).
>
>> This id could be the result of an atomic increment and
>> could be stored in the xfer structure. Of course, it would be one of
>> the values printed out in the events.
>>
>> Also, would you consider a third event, right after the
>> scmi_fetch_response() invocation in scmi_rx_callback()? I've found
>> this to be insightful in situations where we were debugging a timeout.
>>
>> I'm fine if you elect not to do the above; I just wanted to post
>> this for your consideration.
>>
> I am interested in the scenario we can make use of this and also help
> in testing it if we add this. I am not against it but I don't see the
> need for it.
I have a test driver that forks four threads, each of which indirectly
creates SCMI messages of different protocols (sensor, clk, perf, brcm). In other
words, a stress test for the SCMI infrastructure and platform response. I
suspect you have a similar test configuration? I set this up so we can reproduce
some problems w/o having to run the entire middleware stack.

At any rate, I've tested V2, and although I get a lot of

... scmi_xfer_begin: transfer_id=48379 msg_id=7 protocol_id=128 seq=0 poll=0
... scmi_rx_done: transfer_id=48379 msg_id=7 protocol_id=128 seq=0 msg_type=0
... scmi_xfer_end: transfer_id=48380 msg_id=8 protocol_id=128 seq=0 status=0
... scmi_xfer_begin: transfer_id=48381 msg_id=7 protocol_id=128 seq=0 poll=0
... scmi_rx_done: transfer_id=48381 msg_id=7 protocol_id=128 seq=0 msg_type=0
... scmi_xfer_end: transfer_id=48381 msg_id=7 protocol_id=128 seq=0 status=0
... scmi_xfer_begin: transfer_id=48382 msg_id=8 protocol_id=128 seq=0 poll=0
... scmi_rx_done: transfer_id=48382 msg_id=8 protocol_id=128 seq=0 msg_type=0
... scmi_xfer_end: transfer_id=48382 msg_id=8 protocol_id=128 seq=0 status=0
... scmi_xfer_begin: transfer_id=48383 msg_id=7 protocol_id=128 seq=0 poll=0


I also see a stretch of over 100 (contiguous) of these

... scmi_rx_done: transfer_id=48321 msg_id=7 protocol_id=128 seq=0 msg_type=0
... scmi_rx_done: transfer_id=48322 msg_id=8 protocol_id=128 seq=0 msg_type=0
... scmi_rx_done: transfer_id=48323 msg_id=7 protocol_id=128 seq=0 msg_type=0
... scmi_rx_done: transfer_id=48324 msg_id=8 protocol_id=128 seq=0 msg_type=0

which I cannot explain -- perhaps ftrace doesn't work well in interrupt context?
Note also that we use interrupts issued from the platform for both channels.

Regards,
Jim


>
> --
> Regards,
> Sudeep
>