Re: [PATCH 03/22] keembay-ipc: Add Keem Bay IPC module
From: Daniele Alessandrelli
Date: Thu Dec 10 2020 - 13:41:44 EST
On Tue, 2020-12-08 at 20:48 +0100, Greg KH wrote:
> On Tue, Dec 08, 2020 at 06:59:09PM +0000, Daniele Alessandrelli wrote:
> > Hi Greg,
> >
> > Thanks for your feedback.
> >
> > On Wed, 2020-12-02 at 07:19 +0100, Greg KH wrote:
> > > On Tue, Dec 01, 2020 at 02:34:52PM -0800, mgross@xxxxxxxxxxxxxxx wrote:
> > > > From: Daniele Alessandrelli <daniele.alessandrelli@xxxxxxxxx>
> > > >
> > > > On the Intel Movidius SoC code named Keem Bay, communication between the
> > > > Computing Sub-System (CSS), i.e., the CPU, and the Multimedia Sub-System
> > > > (MSS), i.e., the VPU is enabled by the Keem Bay Inter-Processor
> > > > Communication (IPC) mechanism.
> > > >
> > > > Add the driver for using Keem Bay IPC from within the Linux Kernel.
> > > >
> > > > Keem Bay IPC uses the following terminology:
> > > >
> > > > - Node: A processing entity that can use the IPC to communicate;
> > > > currently, we just have two nodes, CPU (CSS) and VPU (MSS).
> > > >
> > > > - Link: Two nodes that can communicate over IPC form an IPC link
> > > > (currently, we just have one link, the one between the CPU
> > > > and VPU).
> > > >
> > > > - Channel: An IPC link can provide multiple IPC channels. IPC channels
> > > > allow communication multiplexing, i.e., the same IPC link can
> > > > be used by different applications for different
> > > > communications. Each channel is identified by a channel ID,
> > > > which must be unique within a single IPC link. Channels are
> > > > divided in two categories, High-Speed (HS) channels and
> > > > General-Purpose (GP) channels. HS channels have higher
> > > > priority over GP channels.
> > > >
> > > > Keem Bay IPC mechanism is based on shared memory and hardware FIFOs.
> > > > Both the CPU and the VPU have their own hardware FIFO. When the CPU
> > > > wants to send an IPC message to the VPU, it writes to the VPU FIFO (MSS
> > > > FIFO); similarly, when MSS wants to send an IPC message to the CPU, it
> > > > writes to the CPU FIFO (CSS FIFO).
> > > >
> > > > A FIFO entry is simply a pointer to an IPC buffer (aka IPC header)
> > > > stored in a portion of memory shared between the CPU and the VPU.
> > > > Specifically, the FIFO entry contains the (VPU) physical address of the
> > > > IPC buffer being transferred.
> > > >
> > > > In turn, the IPC buffer contains the (VPU) physical address of the
> > > > payload (which must be located in shared memory too) as well as other
> > > > information (payload size, IPC channel ID, etc.).
> > > >
> > > > Each IPC node instantiates a pool of IPC buffers from its own IPC buffer
> > > > memory region. When instantiated, IPC buffers are marked as free. When
> > > > the node needs to send an IPC message, it gets the first free buffer it
> > > > finds (from its own pool), marks it as allocated (used), and puts its
> > > > physical address into the IPC FIFO of the destination node. The
> > > > destination node (which is notified by an interrupt when there are
> > > > entries pending in its FIFO) extract the physical address from the FIFO
> > > > and process the IPC buffer, marking it as free once done (so that the
> > > > sender can reuse the buffer).
> > >
> > > Any reason you can't use the dmabuf interface for these memory buffers
> > > you are creating and having to manage "by hand"? I thought that was
> > > what the kernel was wanting to unify on such that individual
> > > drivers/subsystems didn't have to do this on their own.
> >
> > My understanding is that the dmabuf interface is used to share DMA
> > buffers across different drivers, while these buffers are used only
> > internally to the IPC driver (and exchanged only with the VPU
> > firmware). They basically are packet headers that are sent to the VPU.
>
> There's no reason you couldn't use these to share your buffers
> "internally" as well, because you have the same lifetime rules and
> accounting and all other sorts of things you have to handle, right? Why
> rewrite something like this when you should take advantage of common
> code instead?
I looked at dma-buf again, but I'm still failing to see how we can use
it in this driver :/
The problem I'm not able to solve is exactly how to match the lifetime
of this IPC packets (IPC buffers is probably a misleading name, my bad
for using it in the code) with the dma-buf lifetime rules.
Basically, these IPC packets are cache-aligned (64 bytes) and have the
following fixed structure:
struct kmb_ipc_buf {
u32 data_addr;
u32 data_size;
u16 channel;
u8 src_node;
u8 dst_node;
u8 status;
} __packed __aligned(KMB_IPC_ALIGNMENT);
Now, let's ignore the first 5 fields and focus on the last one that
controls the lifetime of the packet. The status field can be either
FREE or ALLOCATED (perhaps should be renamed to IN_USE).
Basically, during probe, the driver allocates an array (pool) of these
packets from its 'local' reserved memory region. This is done using
dma_alloc_coherent(). This array remains allocated until remove() is
called.
At run-time, every time the driver sends a message to the VPU, the
following happens:
1. The IPC driver loops through the array of 'local' packets to find
the first one marked as FREE
2. The IPC driver marks the packet as IN_USE and shares it with the VPU
by putting its physical address into the VPU HW FIFO.
3. The VPU Firmware processes the packet and, when done with it, it
marks it as FREE again so that the IPC driver can use it again.
Please note that the IPC driver is not notified by the VPU when the
packet is marked as free (there is no IRQ or other async mechanism to
do that). The driver will simply find the message free the next time it
loops over the message pool again (i.e., when it is trying to send a
new message).
Given that the free operation is silently done by the VPU, I'm failing
to see how I can use dmabuf (or gen_pool or dma_pool). Is there
anything that I'm missing?