Re: [PATCH] firmware: arm_scmi: fix timeout value for send_message
From: Sudeep Holla
Date: Wed Jun 10 2020 - 04:23:26 EST
On Sun, Jun 07, 2020 at 02:30:23PM -0500, jassisinghbrar@xxxxxxxxx wrote:
> From: Jassi Brar <jaswinder.singh@xxxxxxxxxx>
>
> Currently scmi_do_xfer() submits a message to mailbox api and waits
> for an apparently very short time. This works if there are not many
> messages in the queue already. However, if many clients share a
> channel and/or each client submits many messages in a row, the
The recommendation in such scenarios is to use multiple channel.
> timeout value becomes too short and returns error even if the mailbox
> is working fine according to the load. The timeout occurs when the
> message is still in the api/queue awaiting its turn to ride the bus.
>
> Fix this by increasing the timeout value enough (500ms?) so that it
> fails only if there is an actual problem in the transmission (like a
> lockup or crash).
>
> [If we want to capture a situation when the remote didn't
> respond within expected latency, then the timeout should not
> start here, but from tx_prepare callback ... just before the
> message physically gets on the channel]
>
The bottle neck may not be in the remote. It may be mailbox serialising
the requests even when it can parallelise.
> Signed-off-by: Jassi Brar <jaswinder.singh@xxxxxxxxxx>
> ---
> drivers/firmware/arm_scmi/driver.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c
> index dbec767222e9..46ddafe7ffc0 100644
> --- a/drivers/firmware/arm_scmi/driver.c
> +++ b/drivers/firmware/arm_scmi/driver.c
> @@ -303,7 +303,7 @@ int scmi_do_xfer(const struct scmi_handle *handle, struct scmi_xfer *xfer)
> }
>
> if (xfer->hdr.poll_completion) {
> - ktime_t stop = ktime_add_ns(ktime_get(), SCMI_MAX_POLL_TO_NS);
> + ktime_t stop = ktime_add_ns(ktime_get(), 500 * 1000 * NSEC_PER_USEC);
>
This is unacceptable delay for schedutil fast_switch. So no for this one.
> spin_until_cond(scmi_xfer_done_no_timeout(cinfo, xfer, stop));
>
> @@ -313,7 +313,7 @@ int scmi_do_xfer(const struct scmi_handle *handle, struct scmi_xfer *xfer)
> ret = -ETIMEDOUT;
> } else {
> /* And we wait for the response. */
> - timeout = msecs_to_jiffies(info->desc->max_rx_timeout_ms);
> + timeout = msecs_to_jiffies(500);
In general, this hides issues in the remote. We are trying to move towards
tops 1ms for a request and with MBOX_QUEUE at 20, I see 20ms is more that
big enough. We have it set to 30ms now. 500ms is way too large and not
required IMO.
--
Regards,
Sudeep