Re: [PATCH v2 6/7] mailbox: bcm-flexrm-mailbox: Set msg_queue_len for each channel
From: Jassi Brar
Date: Fri Jul 28 2017 - 06:20:11 EST
On Fri, Jul 28, 2017 at 3:18 PM, Anup Patel <anup.patel@xxxxxxxxxxxx> wrote:
> On Fri, Jul 28, 2017 at 2:34 PM, Jassi Brar <jassisinghbrar@xxxxxxxxx> wrote:
>> On Fri, Jul 28, 2017 at 2:19 PM, Anup Patel <anup.patel@xxxxxxxxxxxx> wrote:
>>> On Thu, Jul 27, 2017 at 5:23 PM, Jassi Brar <jassisinghbrar@xxxxxxxxx> wrote:
>>>> On Thu, Jul 27, 2017 at 11:20 AM, Anup Patel <anup.patel@xxxxxxxxxxxx> wrote:
>>>>> On Thu, Jul 27, 2017 at 10:29 AM, Jassi Brar <jassisinghbrar@xxxxxxxxx> wrote:
>>>>
>>>>>>>>>>> Sorry for the delayed response...
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jul 21, 2017 at 9:16 PM, Jassi Brar <jassisinghbrar@xxxxxxxxx> wrote:
>>>>>>>>>>>> Hi Anup,
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jul 21, 2017 at 12:25 PM, Anup Patel <anup.patel@xxxxxxxxxxxx> wrote:
>>>>>>>>>>>>> The Broadcom FlexRM ring (i.e. mailbox channel) can handle
>>>>>>>>>>>>> larger number of messages queued in one FlexRM ring hence
>>>>>>>>>>>>> this patch sets msg_queue_len for each mailbox channel to
>>>>>>>>>>>>> be same as RING_MAX_REQ_COUNT.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Anup Patel <anup.patel@xxxxxxxxxxxx>
>>>>>>>>>>>>> Reviewed-by: Scott Branden <scott.branden@xxxxxxxxxxxx>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>> drivers/mailbox/bcm-flexrm-mailbox.c | 5 ++++-
>>>>>>>>>>>>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/drivers/mailbox/bcm-flexrm-mailbox.c b/drivers/mailbox/bcm-flexrm-mailbox.c
>>>>>>>>>>>>> index 9873818..20055a0 100644
>>>>>>>>>>>>> --- a/drivers/mailbox/bcm-flexrm-mailbox.c
>>>>>>>>>>>>> +++ b/drivers/mailbox/bcm-flexrm-mailbox.c
>>>>>>>>>>>>> @@ -1683,8 +1683,11 @@ static int flexrm_mbox_probe(struct platform_device *pdev)
>>>>>>>>>>>>> ret = -ENOMEM;
>>>>>>>>>>>>> goto fail_free_debugfs_root;
>>>>>>>>>>>>> }
>>>>>>>>>>>>> - for (index = 0; index < mbox->num_rings; index++)
>>>>>>>>>>>>> + for (index = 0; index < mbox->num_rings; index++) {
>>>>>>>>>>>>> + mbox->controller.chans[index].msg_queue_len =
>>>>>>>>>>>>> + RING_MAX_REQ_COUNT;
>>>>>>>>>>>>> mbox->controller.chans[index].con_priv = &mbox->rings[index];
>>>>>>>>>>>>> + }
>>>>>>>>>>>>>
>>>>>>>>>>>> While writing mailbox.c I wasn't unaware that there is the option to
>>>>>>>>>>>> choose the queue length at runtime.
>>>>>>>>>>>> The idea was to keep the code as simple as possible. I am open to
>>>>>>>>>>>> making it a runtime thing, but first, please help me understand how
>>>>>>>>>>>> that is useful here.
>>>>>>>>>>>>
>>>>>>>>>>>> I understand FlexRm has a ring buffer of RING_MAX_REQ_COUNT(1024)
>>>>>>>>>>>> elements. Any message submitted to mailbox api can be immediately
>>>>>>>>>>>> written onto the ringbuffer if there is some space.
>>>>>>>>>>>> Is there any mechanism to report back to a client driver, if its
>>>>>>>>>>>> message in ringbuffer failed "to be sent"?
>>>>>>>>>>>> If there isn't any, then I think, in flexrm_last_tx_done() you should
>>>>>>>>>>>> simply return true if there is some space left in the rung-buffer,
>>>>>>>>>>>> false otherwise.
>>>>>>>>>>>
>>>>>>>>>>> Yes, we have error code in "struct brcm_message" to report back
>>>>>>>>>>> errors from send_message. In our mailbox clients, we check
>>>>>>>>>>> return value of mbox_send_message() and also the error code
>>>>>>>>>>> in "struct brcm_message".
>>>>>>>>>>>
>>>>>>>>>> I meant after the message has been accepted in the ringbuffer but the
>>>>>>>>>> remote failed to receive it.
>>>>>>>>>
>>>>>>>>> Yes, even this case is handled.
>>>>>>>>>
>>>>>>>>> In case of IO errors after message has been put in ring buffer, we get
>>>>>>>>> completion message with error code and mailbox client drivers will
>>>>>>>>> receive back "struct brcm_message" with error set.
>>>>>>>>>
>>>>>>>>> You can refer flexrm_process_completions() for more details.
>>>>>>>>>
>>>>>> It doesn't seem to be what I suggest. I see two issues in
>>>>>> flexrm_process_completions()
>>>>>> 1) It calls mbox_send_message(), which is a big NO for a controller
>>>>>> driver. Why should you have one more message stored outside of
>>>>>> ringbuffer?
>>>>>
>>>>> The "last_pending_msg" in each FlexRM ring was added to fit FlexRM
>>>>> in Mailbox framework.
>>>>>
>>>>> We don't have any IRQ for TX done so "txdone_irq" out of the question for
>>>>> FlexRM. We only have completions for both success or failures (IO errors).
>>>>>
>>>>> This means we have to use "txdone_poll" for FlexRM. For "txdone_poll",
>>>>> we have to provide last_tx_done() callback. The last_tx_done() callback
>>>>> is supposed to return true if last send_data() call succeeded.
>>>>>
>>>>> To implement last_tx_done() in FlexRM driver, we added "last_pending_msg".
>>>>>
>>>>> When "last_pending_msg" is NULL it means last call to send_data() succeeded
>>>>> and when "last_pending_msg" is != NULL it means last call to send_data()
>>>>> did not go through due to lack of space in FlexRM ring.
>>>>>
>>>> It could be simpler.
>>>> Since flexrm_send_data() is essentially about putting the message in
>>>> the ring-buffer (and not about _transmission_ failures), the
>>>> last_tx_done() should simply return true if requests_ida has not all
>>>> ids allocated. False otherwise.
>>>
>>> It's not that simple because we have two cases in-which
>>> send_data() will fail:
>>> 1. It run-out of IDs in requests_ida
>>> 2. There is no room in BD queue of FlexRM ring. This because each
>>> brcm_message can be translated into variable number of descriptors.
>>> In fact, using SPU2 crypto client we have one brcm_message translating
>>> into 100's of descriptors. All-in-all few messages (< 1024) can also
>>> fill-up the BD queue of FlexRM ring.
>>>
>> OK let me put it abstractly... return false if "there is no space for
>> another message in the ringbuffer", true otherwise.
>
> Let say at time T, there was no space in BD queue. Now at
> time T+X when last_tx_done() it is possible that BD queue
> has space because FlexRM has processed some more
> descriptor.
>
> I think last_tx_done() for "txdone_poll" method will require
> some information passing from send_data() callback to
> last_tx_done() which is last_pending_msg for FlexRM driver.
>
The problem is flexrm_send_data() accepts single as well as batched
messages, so each send_data() can require different spaces. If you
make flexrm_send_data() accept fixed size messages then you can simply
set a flag (say, last_tx_busy) when max possible messages are queued
and unset that flag in flexrm_process_completions().
> Anyways, I plan to try "txdone_ack" method so I will
> remove last_tx_done() and last_pending_msg both.
> What do you think?
>
Sounds good.
>>
>>>>>>
>>>>>> 2) It calls mbox_chan_received_data() which is for messages received
>>>>>> from the remote. And not the way to report failed _transmission_, for
>>>>>> which the api calls back mbox_client.tx_done() . In your client
>>>>>> driver please populate mbox_client.tx_done() and see which message is
>>>>>> reported "sent fine" when.
>>>>>>
>>>>>>
>>>>>>>>>> There seems no such provision. IIANW, then you should be able to
>>>>>>>>>> consider every message as "sent successfully" once it is in the ring
>>>>>>>>>> buffer i.e, immediately after mbox_send_message() returns 0.
>>>>>>>>>> In that case I would think you don't need more than a couple of
>>>>>>>>>> entries out of MBOX_TX_QUEUE_LEN ?
>>>>>>>>>
>>>>>>>>> What I am trying to suggest is that we can take upto 1024 messages
>>>>>>>>> in a FlexRM ring but the MBOX_TX_QUEUE_LEN limits us queuing
>>>>>>>>> more messages. This issue manifest easily when multiple CPUs
>>>>>>>>> queues to same FlexRM ring (i.e. same mailbox channel).
>>>>>>>>>
>>>>>>>> OK then, I guess we have to make the queue length a runtime decision.
>>>>>>>
>>>>>>> Do you agree with approach taken by PATCH5 and PATCH6 to
>>>>>>> make queue length runtime?
>>>>>>>
>>>>>> I agree that we may have to get the queue length from platform, if
>>>>>> MBOX_TX_QUEUE_LEN is limiting performance. That will be easier on both
>>>>>> of us. However I suspect the right fix for _this_ situation is in
>>>>>> flexrm driver. See above.
>>>>>
>>>>> The current implementation is trying to model FlexRM using "txdone_poll"
>>>>> method and that's why we have dependency on MBOX_TX_QUEUE_LEN
>>>>>
>>>>> I think what we really need is new method for "txdone" to model ring
>>>>> manager HW (such as FlexRM). Let's call it "txdone_none".
>>>>>
>>>>> For "txdone_none", it means there is no "txdone" reporting in HW
>>>>> and mbox_send_data() should simply return value returned by
>>>>> send_data() callback. The last_tx_done() callback is not required
>>>>> for "txdone_none" and MBOX_TX_QUEUE_LEN also has no
>>>>> effect on "txdone_none". Both blocking and non-blocking clients
>>>>> are treated same for "txdone_none".
>>>>>
>>>> That is already supported :)
>>>
>>> If you are referring to "txdone_ack" then this cannot be used here
>>> because for "txdone_ack" we have to call mbox_chan_txdon() API
>>> after writing descriptors in send_data() callback which will cause
>>> dead-lock in tx_tick() called by mbox_chan_txdone().
>>>
>> Did you read my code snippet below?
>>
>> It's not mbox_chan_txdone(), but mbox_client_txdone() which is called
>> by the client.
>>
>>>>
>>>> In drivers/dma/bcm-sba-raid.c
>>>>
>>>> sba_send_mbox_request(...)
>>>> {
>>>> ......
>>>> req->msg.error = 0;
>>>> ret = mbox_send_message(sba->mchans[mchans_idx], &req->msg);
>>>> if (ret < 0) {
>>>> dev_err(sba->dev, "send message failed with error %d", ret);
>>>> return ret;
>>>> }
>>>> ret = req->msg.error;
>>>> if (ret < 0) {
>>>> dev_err(sba->dev, "message error %d", ret);
>>>> return ret;
>>>> }
>>>> .....
>>>> }
>>>>
>>>> Here you _do_ assume that as soon as the mbox_send_message() returns,
>>>> the last_tx_done() is true. In other words, this is a case of client
>>>> 'knows_txdone'.
>>>>
>>>> So ideally you should specify cl->knows_txdone = true during
>>>> mbox_request_channel() and have ...
>>>>
>>>> sba_send_mbox_request(...)
>>>> {
>>>> ret = mbox_send_message(sba->mchans[mchans_idx], &req->msg);
>>>> if (ret < 0) {
>>>> dev_err(sba->dev, "send message failed with error %d", ret);
>>>> return ret;
>>>> }
>>>>
>>>> ret = req->msg.error;
>>>>
>>>> /* Message successfully placed in the ringbuffer, i.e, done */
>>>> mbox_client_txdone(sba->mchans[mchans_idx], ret);
>>>>
>>>> if (ret < 0) {
>>>> dev_err(sba->dev, "message error %d", ret);
>>>> return ret;
>>>> }
>>>>
>>>> .....
>>>> }
>>>>
>>>
>>> I think we need to improve mailbox.c so that
>>> mbox_chan_txdone() can be called from
>>> send_data() callback.
>>>
>> No please. Other clients call mbox_send_message() followed by
>> mbox_client_txdone(), and they are right. For example,
>> drivers/firmware/tegra/bpmp.c
>
> OK so I got confused between mbox_chan_txdone() and
> mbox_client_txdone().
>
> We should do mbox_client_txdone() from mailbox client
> when mbox_chan txmethod is ACK.
>
Yes.
Thanks.