Re: [PATCH 1/2] mailbox: switch to hrtimer for tx_complete polling

From: Jassi Brar
Date: Sun Jul 26 2015 - 23:26:48 EST


On Fri, Jul 24, 2015 at 2:17 PM, Sudeep Holla <sudeep.holla@xxxxxxx> wrote:
> On 24/07/15 06:02, Jassi Brar wrote:
>>
>> On Wed, Jul 22, 2015 at 5:58 PM, Sudeep Holla <sudeep.holla@xxxxxxx>
>> wrote:
>>
>>> we might end-up waiting
>>> for atleast a jiffy even though the response for that message from the
>>> remote is received via interrupt and processed in relatively smaller
>>> time granularity.
>>>
>> That is wrong.
>
> No see below.
>
>> If the controller supports TX interrupt it should set txdone_irq,
>> which prevents polling i.e, controller driver calls mbox_chan_txdone.
>>
>> If the controller doesn't support TX interrupt but the client
>> receives some ack packet, then the client should set knows_txdone and
>> call mbox_client_txdone. Again you don't have to wait on polling.
>>
>
> Sorry if I was not clear in the commit message, but I thought I did
> mention TXDONE_BY_POLL. The case I am referring is definitely not
> TXDONE_BY_IRQ or TXDONE_BY_ACK.
>
That statement is still wrong. The TXDONE_BY_POLL modifier does't make it right.

Anyways, I see you meant the 3rd case of neither IRQ nor ACK.

> Few reasons I have in mind as why we need that:
>
> 1. The remote processor is capable of dealing with requests in orders of
> milliseconds. E.g. on Juno it reports the DVFS transition latency is
> 1.2ms. Now if we report that to CPUFreq governor, we need to ensure
> that. With current jiffy based timer we see latencies > 10ms.
>
Can I see the code somewhere?
It seems your remote doesn't send some protocol level 'ack' packet
replying if the command was successfully executed or not. That means
Linux can't differentiate successful execution of the command from a
silent failure (remote still has to set the TX_done flag to make way
for next messages). From Linux' POV our best bet seems to be to submit
the request and simply assume it done. So probably do async rather
than blocking.

> 2. Because of that, under stress testing with multiple clients active at
> a time, I am seeing the mailbox buffer overflows quite easily just
> because it's blocked on Tx polling(almost 10x slower) and doesn't
> process new requests though the remote can handle.
>
Yeah this situation may arise. The following fix is needed regardless, I think.

============================
diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
index 07fd507..a1dded9 100644
--- a/drivers/mailbox/mailbox.c
+++ b/drivers/mailbox/mailbox.c
@@ -64,7 +64,12 @@ static void msg_submit(struct mbox_chan *chan)

spin_lock_irqsave(&chan->lock, flags);

- if (!chan->msg_count || chan->active_req)
+ if (chan->active_req) {
+ err = 0;
+ goto exit;
+ }
+
+ if (!chan->msg_count)
goto exit;

count = chan->msg_count;
============================

> 3. Also there are efforts for scheduler-driven cpu frequency selection
> where the desired latency should be as low as possible. Juri(cc-ed)
> is working on that and reported this issue with mailbox core.
>
Sure, latency should be as low as possible. Sounds similar to (1). Was
it reported on some mailing list?

> Hope this clarifies the reasons for switching to hrtimer.
>
I am not against using hrtimer, just need to make sure we don't simply
suppress the symptoms of wrong implementation.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/