Re: [PATCH v3 03/34] mailbox: vpu-ipc-mailbox: Add support for Intel VPU IPC mailbox

From: Alessandrelli, Daniele
Date: Mon Feb 01 2021 - 10:51:16 EST


Hi Jassi,

Thank you very much for your feedback.

On Mon, 2021-02-01 at 01:07 -0600, Jassi Brar wrote:
> On Fri, Jan 29, 2021 at 8:21 PM <mgross@xxxxxxxxxxxxxxx> wrote:
> > From: Daniele Alessandrelli <daniele.alessandrelli@xxxxxxxxx>
> >
> > Add mailbox controller enabling inter-processor communication (IPC)
> > between the CPU (aka, the Application Processor - AP) and the VPU on
> > Intel Movidius SoCs like Keem Bay.
> >
> > The controller uses HW FIFOs to enable such communication. Specifically,
> > there are two FIFOs, one for the CPU and one for VPU. Each FIFO can hold
> > 128 entries (messages) of 32-bit each (but only 26 bits are actually
> > usable, since the 6 least-significant bits are reserved).
> >
> > When the Linux kernel on the AP needs to send messages to the VPU
> > firmware, it writes them to the VPU FIFO; similarly, when the VPU
> > firmware needs to send messages to the AP, it writes them to the CPU
> > FIFO.
> >
> > The AP is notified of pending messages in the CPU FIFO by means of the
> > 'FIFO-not-empty' interrupt, which is generated by the CPU FIFO while not
> > empty. This interrupt is cleared automatically once all messages have
> > been read from the FIFO (i.e., the FIFO has been emptied).
> >
> > The hardware doesn't provide an TX done IRQ (i.e., an IRQ that allows
> > the VPU firmware to notify the AP that the message put into the VPU FIFO
> > has been received); however the AP can ensure that the message has been
> > successfully put into the VPU FIFO (and therefore transmitted) by
> > checking the VPU FIFO status register to ensure that writing the message
> > didn't cause the FIFO to overflow.
> >
> > Therefore, the mailbox controller is configured as capable of tx_done
> > IRQs and a tasklet is used to simulate the tx_done IRQ. The tasklet is
> > activated by send_data() right after the message has been put into the
> > VPU FIFO and the VPU FIFO status registers has been checked. If an
> > overflow is reported by the status register, the tasklet passes -EBUSY
> > to mbox_chan_txdone(), to notify the mailbox client of the failed TX.
> >
> > The client should therefore register a tx_done() callback to properly
> > handle failed transmissions.
> >
> > Note: the 'txdone_poll' mechanism cannot be used because it doesn't
> > provide a way to report a failed transmission.
> >
> txdone means the last submitted transfer has been done with --
> successfully or not.

Yes, that's usually the case, but not for poll mode (at least from what
I can tell).

last_tx_done() can return only true or false, but when true is
returned, there is no way to know if the TX was successful or not; the
mailbox framework just seems to assume that 'true' means "message
successfully transmitted", since 0 is passed to tx_tick() (end
eventually to the tx_done() client callback):
https://github.com/torvalds/linux/blob/master/drivers/mailbox/mailbox.c#L131

If 'false' is returned, the polling continues until the timeout is reached.

(please correct me if my above understanding is wrong)

> So I think we can do without the tasklet as explained below....
>
> ....
>
> > +static int vpu_ipc_mailbox_send_data(struct mbox_chan *chan, void *data)
> > +{
> > + struct vpu_ipc_mbox *vpu_ipc_mbox = chan->con_priv;
> > + u32 entry, overflow;
> > +
> > + entry = *((u32 *)data);
> > +
> Are all messages max 32bits wide?
> Usually the controller specifies a packet format (more than just a
> word but of course that's not mandatory) that a client submits the
> data to be transmitted in. Esp when it has deep FIFOs.

It's actually only 26 bits, since the last 6 bits are reserved for the
overflow detection mechanism; I should probably have explained this
better in the commit message, sorry!

Basically, the AP is not the only one that can write to the VPU FIFO:
other components within the SoC can write to it too. Each of these
components has a unique 6-bit processor ID associated to it. The HW
FIFO expects that the last 6 bits of each 32-bit FIFO entry contain the
processor ID of the sender.

Sending a message works as follows:
1. The message must be a 32-bit value with the last 6-bit set to 0 (in
practice, the message is meant to be a 32-bit address value, aligned
to 64 bytes).
2. The sender adds its processor ID to the 32-bit message being sent: M
= m | ProcID
3. The sender writes the message (M) to the TIM_IPC_FIFO register
4. The HW atomically checks if the FIFO is full and if not it writes it
to the actual FIFO; if the FIFO is full, the HW reads the ProcID
from M and then sets the corresponding bit of TIM_IPC_FIFO_OF_FLAG0,
to signal that the write failed, because the FIFO was full
5. The sender reads the TIM_IPC_FIFO_OF_FLAG0 register and checks if
the bit corresponding to its ProcID has been set (in order to know
if the TX succeeded or failed); if the bit is set, the sender clears
it.

Note: as briefly mentioned above, the 32-bit value is meant to be a 32-
bit physical address (64-byte aligned). This address points to a
predefined struct (i.e., the IPC packet) in shared memory. However,
since this struct is not HW dependent (it's just the struct the VPU
firmware expects and in theory it could change if a different VPU FW is
used), I didn't define it here, but in the Keem Bay IPC driver (patch 4
and 5), which is the mailbox client of this controller (does this
design seem reasonable to you?)

>
> > + /* Ensure last 6-bits of entry are not used. */
> > + if (unlikely(entry & IPC_FIFO_ENTRY_RSVD_MASK)) {
> > + vpu_ipc_mbox->txdone_result = -EINVAL;
> > + goto exit;
> > + }
> > +
> > + /* Add processor ID to entry. */
> > + entry |= IPC_FIFO_ID_CPU & IPC_FIFO_ENTRY_RSVD_MASK;
> > +
> > + /* Write entry to VPU FIFO. */
> > + iowrite32(entry, vpu_ipc_mbox->vpu_fifo_base + IPC_FIFO);
> > +
> > + /* Check if we overflew the VPU FIFO. */
> > + overflow = ioread32(vpu_ipc_mbox->vpu_fifo_base + IPC_FIFO_OF_FLAG0) &
> > + BIT(IPC_FIFO_ID_CPU);
> > + if (unlikely(overflow)) {
> > + /* Reset overflow register. */
> > + iowrite32(BIT(IPC_FIFO_ID_CPU),
> > + vpu_ipc_mbox->vpu_fifo_base + IPC_FIFO_OF_FLAG0);
> > + vpu_ipc_mbox->txdone_result = -EBUSY;
> > + goto exit;
> > + }
> > + vpu_ipc_mbox->txdone_result = 0;
> > +
> > +exit:
> > + /* Schedule tasklet to call mbox_chan_txdone(). */
> > + tasklet_schedule(&vpu_ipc_mbox->txdone_tasklet);
> > +
> > + return 0;
> > +}
> > +
> Maybe set txdone_poll and implement last_tx_done() where you can wait
> for FIFO to have enough space for another message, so that the next
> submitted request will never return -EBUSY.

I cannot do that because the AP is not the only one writing to the VPU
FIFO (sorry again for not mentioning this in the commit message) and
therefore, even if at time 't' there is enough space for another
message, at time 't+1' (when the driver actually tries to send the
message) the FIFO might be full.

I think that in order for me to avoid using the tasklet we might have
to change the mailbox framework itself.

For instance, we could modify the last_tx_done() to return 0 on success
and negative error code on failure. Then we can agree that if -EAGAIN
is returned, the mailbox framework will keep polling, while if the
error is something else, the error will be passed to tx_tick() (and
eventually to the tx_done callback).

However, for my specific case, the best solution would probably be to
modify how send_data() return values are used (so that I can avoid the
polling delay). At the moment, my understanding is that when
send_data() reutrn -EBUSY, the message is left into the mailbox channel
queue (and will be transmitted at a later stage). While I would
probably benefit from having an option to return the error to the
client, so that it's up to the client to decide if it's worth trying
the re-transmission or not.

Anyway, the current tasklet solution works fine with me, so I'm happy
to stick with it if that's fine with you.


>
> thanks