Re: [PATCH v5 09/13] mailbox: Add Gunyah message queue mailbox

From: Dmitry Baryshkov
Date: Mon Oct 17 2022 - 04:43:52 EST


On 14/10/2022 01:32, Elliot Berman wrote:


On 10/12/2022 2:47 PM, Dmitry Baryshkov wrote:
On 11/10/2022 03:08, Elliot Berman wrote:
+
+static irqreturn_t gh_msgq_tx_irq_handler(int irq, void *data)
+{
+    struct gunyah_msgq *msgq = data;
+
+    mbox_chan_txdone(gunyah_msgq_chan(msgq), 0);
+
+    return IRQ_HANDLED;
+}
+
+static void gh_msgq_txdone_tasklet(unsigned long data)
+{
+    struct gunyah_msgq *msgq = (struct gunyah_msgq *)data;
+
+    mbox_chan_txdone(gunyah_msgq_chan(msgq), msgq->last_status);

I don't quite get this. Why do you need both an IRQ and a tasklet?


I've now tweaked the code comments now as well to explain a bit better.

Gunyah tells us in the hypercall itself whether the message queue is full. Once the the message queue is full, Gunyah will let us know when reader starts draining the queue and we can start adding more messages via the tx_irq.

One point to note: the last message to be sent into the message queue that makes the queue full can be detected. The hypercall reports that the message was sent (GH_ERROR_OK) and the "ready" return value is false. In its current form, the msgq mailbox driver should never make a send hypercall and get GH_ERROR_MSGQUEUE_FULL because the driver properly track when the message queue is full.

When mailbox driver reports txdone, the implication is that more messages can be sent (not just that the message was transmitted). In typical operation, the msgq mailbox driver can immediately report that the message was sent and no tx_irq happens because the hypercall returns GH_ERROR_OK and ready=true. The mailbox framework doesn't allow txdone directly from the send_data callback. To work around that, Jassi recommended we use tasklet [1]. In the "atypical" case where message queue becomes full, we get GH_ERROR_OK and ready=false. In that case, we don't report txdone right away with the tasklet and instead wait for the tx_irq to know when more messages can be sent.

Can we please get some sort of this information into the comments in the source file?


[1]: Tasklet works because send_data is called from mailbox framework with interrupts disabled. Once interrupts are re-enabled, the txdone is allowed to happen which is also when tasklet runs.

+
+    /**
+     * EAGAIN: message didn't send.
+     * ret = 1: message sent, but now the message queue is full and we can't send any more msgs.
+     * Either way, don't report that this message is done.
+     */
+    if (ret == -EAGAIN || ret == 1)
+        return ret;

'1' doesn't seem to be a valid return code for _send_data.

Also it would be logical to return any error here, not just -EAGAIN.



If I return error to mailbox framework, then the message is stuck: clients don't know that there was some underlying transport failure. It would be retried if the client sends another message, but there is no guarantee that either retrying later would work (what would have changed?) nor that client would send another message to trigger retry. If the message is malformed or message queue not correctly set up, client would never know. Client should be told that the message wasn't sent.

I see. msg_submit() doesn't propagate the error.



+int gunyah_msgq_init(struct device *parent, struct gunyah_msgq *msgq, struct mbox_client *cl,
+             struct gunyah_resource *tx_ghrsc, struct gunyah_resource *rx_ghrsc)

Are the message queues allocated/created dynamically or statically? If the later is true, please use devm_request(_threaded)_irq and devm_kzalloc.


With the exception of resource manager, message queues are created dynamically.

P.S. Thanks for all the other suggestions in this and the other patches, I've applied them.

Thanks,
Elliot

--
With best wishes
Dmitry