Re: [RFC] dt-bindings: mailbox: add doorbell support to ARM MHU

From: Jassi Brar
Date: Fri Jun 05 2020 - 11:43:09 EST

On Fri, Jun 5, 2020 at 3:58 AM Sudeep Holla <sudeep.holla@xxxxxxx> wrote:
> > > > >> bash-1526 [000] 1149.472553: scmi_xfer_begin: transfer_id=1538 msg_id=6 protocol_id=21 seq=0 poll=0
> > > > >> <idle>-0 [001] 1149.472733: scmi_xfer_begin: transfer_id=1539 msg_id=7 protocol_id=19 seq=1 poll=1
> > > > >
> > > > Here another request is started before the first is finished.
> > >
> > > Ah, the prints are when the client requested. It is not when the mailbox
> > > started it. So this just indicates the beginning of the transfer from the
> > > client.
> > >
> > There maybe condition on a sensor read to finish within 1ms, but there
> > is no condition for the read to _start_ at this very moment (usually
> > there are sleeps in the path to sensor requests).
> >
> Again I wasn't clear. The trace logs are at the point just before calling
> mbox_send_messages. So any delay in sensor drivers won't get include. It
> is after the point sensor driver request to read the value and before we
> send the request via mailbox.
No, you were clear, I wasn't. Let me try again.

Since origin upto scmi_xfer, there can be many forms of sleep like
schedule/mutexlock etc.... think of some userspace triggering sensor
or dvfs operation. Linux does not provide real-time guarantees. Even
if remote (scmi) firmware guarantee RT response, it makes sense to
timeout a response only after the _request is on the bus_ and not
when you submit a request to the api (unless you serialise it).
IOW, start the timeout from mbox_client.tx_prepare() when the
message actually gets on the bus.

> > You have shared only 'bad' log without serialising access. Please
> > share log after serialising access to the channel and the 'good' log
> > with virtual channels. That should put the topic to rest.
> >
> I didn't realise that, sorry for missing that earlier. Attached both
> now, thanks for asking.
Interesting logs ! The time taken to complete _successful_ requests
are arguably better in bad_trace ... there are many <10usec responses
in bad_trace, while the fastest response in good_trace is 53usec.
And the requests that 'fail/timeout' are purely the result of not
serialising them or checkout for timeout at wrong place as explained