Re: [RFC] dt-bindings: mailbox: add doorbell support to ARM MHU

From: Sudeep Holla
Date: Thu Jun 11 2020 - 06:00:37 EST


Hi Viresh,

Thanks for summarising the thoughts quite nicely.

On Wed, Jun 10, 2020 at 03:03:34PM +0530, Viresh Kumar wrote:
> On 05-06-20, 10:42, Jassi Brar wrote:
> > Since origin upto scmi_xfer, there can be many forms of sleep like
> > schedule/mutexlock etc.... think of some userspace triggering sensor
> > or dvfs operation. Linux does not provide real-time guarantees. Even
> > if remote (scmi) firmware guarantee RT response, it makes sense to
> > timeout a response only after the _request is on the bus_ and not
> > when you submit a request to the api (unless you serialise it).
> > IOW, start the timeout from mbox_client.tx_prepare() when the
> > message actually gets on the bus.
>
> There are multiple purposes of the timeout IMO:
>
> - Returning early if the other side is dead/hung, in such a case the
> timeout can be put when the request is put on the bus as we don't
> care of the time it takes to complete the request until the time the
> request can be fulfilled. This can be a example of i2c/spi memory
> read.
>
> - Ensuring maximum time in which the request needs to be serviced.
> There may be hard requirements, like in case for DVFS from
> scheduler's hot path (which is essential for better working of the
> overall system). And for such a case the timeout is placed at the
> right place IMO, i.e. right after a request is submitted to mailbox.
>

Agreed on both points.

> And some more points I wanted to share..
>
> - I am not sure I understood the *serializing* part you guys were
> talking about. I believe mailbox framework is already serializing
> the requests it is receiving on a single channel with a spin lock,
> right ? Why does the client need to serialize them as well? Is that
> for avoiding timeouts ?
>
> - For me, and Sudeep as well IIUC, the bigger problem isn't that
> timeouts are happening and requests are failing (and so changing the
> timeout to a bigger value isn't going to fix anything), but the
> problem is that it is taking too long (because of the queue of
> requests on a channel) for a request to finish after being
> submitted. Scheduler doesn't care of the underneath logistics for
> example, all it cares for is the time it takes to change the
> frequency of a CPU. If you can do it fast enough in a guaranteed
> manner, then you can use fast switching, otherwise not.
>
> - The hardware can very well support the case today where this can be
> done in parallel and (almost) in a guaranteed time-frame. While the
> software wants to add a limit to that and so wants to serialize
> requests.
>

+1

> - As many people have already suggested it (like me, Sudeep, Rob,
> maybe Bjorn as well), it seems silly to not allow driving the h/w in
> the most efficient way possible (and allow fast cpu switching in
> this case).
>
> > Interesting logs ! The time taken to complete _successful_ requests
> > are arguably better in bad_trace ... there are many <10usec responses
> > in bad_trace, while the fastest response in good_trace is 53usec.
>
> Indeed this is interesting. It may be worth looking (separately) into
> why don't we see those 3 us long requests anymore, or maybe they were
> just not there in the logs.
>

As I mentioned in another thread that non-dvfs requests may be prioritised
lower when there are parallel request to the remote. The so called bad
trace doesn't have such scenario with single channel and all requests
from OS being serialised. The good trace has 2 channels and requests to
remote happen in parallel and hence it is fair to see slightly higher
latencies for lower priority requests.

--
Regards,
Sudeep