Re: [RESEND RFC/PATCH 3/8] media: platform: mtk-vpu: Support Mediatek VPU

From: Daniel Thompson
Date: Wed Nov 25 2015 - 11:11:47 EST


On 17/11/15 12:54, Tiffany Lin wrote:
From: Andrew-CT Chen <andrew-ct.chen@xxxxxxxxxxxx>

The VPU driver for hw video codec embedded in Mediatek's MT8173 SOCs.
It is able to handle video decoding/encoding of in a range of formats.
The driver provides with VPU firmware download, memory management and
the communication interface between CPU and VPU.
For VPU initialization, it will create virtual memory for CPU access and
IOMMU address for vcodec hw device access. When a decode/encode instance
opens a device node, vpu driver will download vpu firmware to the device.
A decode/encode instant will decode/encode a frame using VPU
interface to interrupt vpu to handle decoding/encoding jobs.

Signed-off-by: Andrew-CT Chen <andrew-ct.chen@xxxxxxxxxxxx>
---
drivers/media/platform/Kconfig | 6 +
drivers/media/platform/Makefile | 2 +
drivers/media/platform/mtk-vpu/Makefile | 1 +
.../platform/mtk-vpu/h264_enc/venc_h264_vpu.h | 127 +++
.../media/platform/mtk-vpu/include/venc_ipi_msg.h | 212 +++++
drivers/media/platform/mtk-vpu/mtk_vpu_core.c | 823 ++++++++++++++++++++
drivers/media/platform/mtk-vpu/mtk_vpu_core.h | 161 ++++
.../media/platform/mtk-vpu/vp8_enc/venc_vp8_vpu.h | 119 +++
8 files changed, 1451 insertions(+)
create mode 100644 drivers/media/platform/mtk-vpu/Makefile
create mode 100644 drivers/media/platform/mtk-vpu/h264_enc/venc_h264_vpu.h
create mode 100644 drivers/media/platform/mtk-vpu/include/venc_ipi_msg.h
create mode 100644 drivers/media/platform/mtk-vpu/mtk_vpu_core.c
create mode 100644 drivers/media/platform/mtk-vpu/mtk_vpu_core.h
create mode 100644 drivers/media/platform/mtk-vpu/vp8_enc/venc_vp8_vpu.h

diff --git a/drivers/media/platform/Kconfig b/drivers/media/platform/Kconfig
index ccbc974..f98eb47 100644
--- a/drivers/media/platform/Kconfig
+++ b/drivers/media/platform/Kconfig
@@ -148,6 +148,12 @@ config VIDEO_CODA
Coda is a range of video codec IPs that supports
H.264, MPEG-4, and other video formats.

+config MEDIATEK_VPU
+ bool "Mediatek Video Processor Unit"
+ ---help---
+ This driver provides downloading firmware vpu and
+ communicating with vpu.
+

This looks pretty broken.

Shouldn't this option be tristate? Why are there no depends-on or selects?

Also I think the help text could be more descriptive here (and so does checkpatch ;-) ).


diff --git a/drivers/media/platform/mtk-vpu/h264_enc/venc_h264_vpu.h b/drivers/media/platform/mtk-vpu/h264_enc/venc_h264_vpu.h
new file mode 100644
index 0000000..9c8ebdd
--- /dev/null
+++ b/drivers/media/platform/mtk-vpu/h264_enc/venc_h264_vpu.h

Why is this file included in *this* patch? It is not included by anything in the patch and defines functions that do not exist yet.


diff --git a/drivers/media/platform/mtk-vpu/include/venc_ipi_msg.h b/drivers/media/platform/mtk-vpu/include/venc_ipi_msg.h
new file mode 100644
index 0000000..a345b98

This file also is not included by anything and should, perhaps be included in a different patch.


diff --git a/drivers/media/platform/mtk-vpu/mtk_vpu_core.c b/drivers/media/platform/mtk-vpu/mtk_vpu_core.c
new file mode 100644
index 0000000..b524dbc
--- /dev/null
+++ b/drivers/media/platform/mtk-vpu/mtk_vpu_core.c
@@ -0,0 +1,823 @@
+/*
+* Copyright (c) 2015 MediaTek Inc.
+* Author: Andrew-CT Chen <andrew-ct.chen@xxxxxxxxxxxx>
+*
+* This program is free software; you can redistribute it and/or modify
+* it under the terms of the GNU General Public License version 2 as
+* published by the Free Software Foundation.
+*
+* This program is distributed in the hope that it will be useful,
+* but WITHOUT ANY WARRANTY; without even the implied warranty of
+* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+* GNU General Public License for more details.
+*/
+#include <linux/clk.h>
+#include <linux/debugfs.h>
+#include <linux/firmware.h>
+#include <linux/interrupt.h>
+#include <linux/iommu.h>
+#include <linux/module.h>
+#include <linux/of_address.h>
+#include <linux/of_irq.h>
+#include <linux/of_platform.h>
+#include <linux/sched.h>
+#include <linux/sizes.h>
+
+#include "mtk_vpu_core.h"
+
+/**
+ * VPU (video processor unit) is a tiny processor controlling video hardware
+ * related to video codec, scaling and color format converting.
+ * VPU interfaces with other blocks by share memory and interrupt.
+ **/
+#define MTK_VPU_DRV_NAME "mtk_vpu"

This is only used once, why not just put this string directly into the .name field?

+
+#define INIT_TIMEOUT_MS 2000U
+#define IPI_TIMEOUT_MS 2000U
+#define VPU_FW_VER_LEN 16
+
+/* vpu extended virtural address */
+#define VPU_PMEM0_VIRT(vpu) ((vpu)->mem.p_va)
+#define VPU_DMEM0_VIRT(vpu) ((vpu)->mem.d_va)
+/* vpu extended iova address*/
+#define VPU_PMEM0_IOVA(vpu) ((vpu)->mem.p_iova)
+#define VPU_DMEM0_IOVA(vpu) ((vpu)->mem.d_iova)

These feel like obfuscation to me. The code looks clearer without the macro For example:

vpu->mem.p_va = dma_alloc_coherent(dev, ...);

Is much clearer than:

VPU_PMEM0_VIRT(vpu) = dma_alloc_coherent(dev, ...);

+
+#define VPU_PTCM(dev) ((dev)->reg.sram)
+#define VPU_DTCM(dev) ((dev)->reg.sram + VPU_DTCM_OFFSET)

These macros also seem to add very little value. They are consumed only a couple of times and only serve to conceal how the sram is mapped and consumed:

dest = vpu->reg.sram;
if (fw_type == D_FW)
dest += VPU_DTCM_OFFSET;

Compared to:

dest = fw_type ? VPU_DTCM(vpu) : VPU_PTCM(vpu);

+
+#define VPU_PTCM_SIZE (96 * SZ_1K)
+#define VPU_DTCM_SIZE (32 * SZ_1K)
+#define VPU_DTCM_OFFSET 0x18000UL
+#define VPU_EXT_P_SIZE SZ_1M
+#define VPU_EXT_D_SIZE SZ_4M
+#define VPU_P_FW_SIZE (VPU_PTCM_SIZE + VPU_EXT_P_SIZE)
+#define VPU_D_FW_SIZE (VPU_DTCM_SIZE + VPU_EXT_D_SIZE)
+#define SHARE_BUF_SIZE 48
+
+#define VPU_P_FW "vpu_p.bin"
+#define VPU_D_FW "vpu_d.bin"

These macros are of questionable value.

+
+#define VPU_BASE 0x0

Is this register really called "base" in the datasheet? From the use in the code it looks like it performs a reset and/or PM function.


+#define VPU_TCM_CFG 0x0008
+#define VPU_PMEM_EXT0_ADDR 0x000C
+#define VPU_PMEM_EXT1_ADDR 0x0010
+#define VPU_TO_HOST 0x001C
+#define VPU_DMEM_EXT0_ADDR 0x0014
+#define VPU_DMEM_EXT1_ADDR 0x0018
+#define HOST_TO_VPU 0x0024
+#define VPU_PC_REG 0x0060
+#define VPU_WDT_REG 0x0084
+
+/* vpu inter-processor communication interrupt */
+#define VPU_IPC_INT BIT(8)
+
+/**
+ * enum vpu_fw_type - VPU firmware type
+ *
+ * @P_FW: program firmware
+ * @D_FW: data firmware
+ *
+ */
+enum vpu_fw_type {
+ P_FW,
+ D_FW,
+};
+
+/**
+ * struct vpu_mem - VPU memory information

Perhaps "VPU extended program/data memory information" instead.


+ *
+ * @p_va: the kernel virtual memory address of
+ * VPU extended program memory
+ * @d_va: the kernel virtual memory address of VPU extended data memory
+ * @p_iova: the iova memory address of VPU extended program memory
+ * @d_iova: the iova memory address of VPU extended data memory
+ */
+struct vpu_mem {
+ void *p_va;
+ void *d_va;
+ dma_addr_t p_iova;
+ dma_addr_t d_iova;
+};

Might be better as:

struct {
void *va;
dma_addr_t iova;
}

This can be then be used as: vpu->mem[P_FW].va . This doesn't matter much yet but it would helps us common up some code later.


+
+/**
+ * struct vpu_regs - VPU SRAM and configuration registers
+ *
+ * @sram: the register for VPU sram
+ * @cfg: the register for VPU configuration
+ * @irq: the irq number for VPU interrupt
+ */
+struct vpu_regs {
+ void __iomem *sram;

This is called TCM everywhere else. Why a different name for it here?


+ void __iomem *cfg;
+ int irq;
+};
+
+/**
+ * struct vpu_run - VPU initialization status
+ *
+ * @signaled: the signal of vpu initialization completed
+ * @fw_ver: VPU firmware version
+ * @wq: wait queue for VPU initialization status
+ */
+struct vpu_run {
+ u32 signaled;
+ char fw_ver[VPU_FW_VER_LEN];
+ wait_queue_head_t wq;
+};
+
+/**
+ * struct vpu_ipi_desc - VPU IPI descriptor
+ *
+ * @handler: IPI handler
+ * @name: the name of IPI handler
+ * @priv: the private data of IPI handler
+ */
+struct vpu_ipi_desc {
+ ipi_handler_t handler;
+ const char *name;
+ void *priv;
+};
+
+/**
+ * struct share_obj - The DTCM (Data Tightly-Coupled Memory) buffer shared with
+ * AP and VPU

Remove "The" (there are more than one of these buffers).

+ *
+ * @id: IPI id
+ * @len: share buffer length
+ * @share_buf: share buffer data
+ */
+struct share_obj {
+ int32_t id;
+ uint32_t len;
+ unsigned char share_buf[SHARE_BUF_SIZE];
+};
+
+/**
+ * struct mtk_vpu - vpu driver data
+ * @mem: VPU extended memory information
+ * @reg: VPU SRAM and configuration registers
+ * @run: VPU initialization status
+ * @ipi_desc: VPU IPI descriptor
+ * @recv_buf: VPU DTCM share buffer for receiving. The
+ * receive buffer is only accessed in interrupt context.
+ * @send_buf: VPU DTCM share buffer for sending
+ * @dev: VPU struct device
+ * @clk: VPU clock on/off
+ * @vpu_mutex: protect mtk_vpu (except recv_buf) and ensure only
+ * one client to use VPU service at a time. For example,
+ * suppose a client is using VPU to decode VP8.
+ * If the other client wants to encode VP8,
+ * it has to wait until VP8 decode completes.
+ *
+ */
+struct mtk_vpu {
+ struct vpu_mem mem;

Rename to extmem?

+ struct vpu_regs reg;
+ struct vpu_run run;
+ struct vpu_ipi_desc ipi_desc[IPI_MAX];
+ struct share_obj *recv_buf;
+ struct share_obj *send_buf;
+ struct device *dev;
+ struct clk *clk;
+ struct mutex vpu_mutex; /* for protecting vpu data data structure */
+};
+
+/* the thread calls the function should hold the |vpu_mutex| */

Remove this comment. Its unhelpful: the code does not meet this requirement because it overstates the scope of vpu_mutex .

+static inline void vpu_cfg_writel(struct mtk_vpu *vpu, u32 val, u32 offset)
+{
+ writel(val, vpu->reg.cfg + offset);
+}
+
+static inline u32 vpu_cfg_readl(struct mtk_vpu *vpu, u32 offset)
+{
+ return readl(vpu->reg.cfg + offset);
+}
+
+static inline bool vpu_running(struct mtk_vpu *vpu)
+{
+ return vpu_cfg_readl(vpu, VPU_BASE) & BIT(0);
+}
+
+void vpu_disable_clock(struct platform_device *pdev)
+{
+ struct mtk_vpu *vpu = platform_get_drvdata(pdev);
+
+ /* Disable VPU watchdog */
+ vpu_cfg_writel(vpu,
+ vpu_cfg_readl(vpu, VPU_WDT_REG) & ~(1L<<31),
+ VPU_WDT_REG);

This code combines a reference counted clock disable with a not-reference counted register write and will result in the watchdot being spuriously disabled.

This will definitely happen if vpu_debug_read() is called at the wrong time, possibly may also be an issue for concurrent H.264 and VP8 operations.

+
+ clk_disable_unprepare(vpu->clk);
+}
+
+int vpu_enable_clock(struct platform_device *pdev)
+{
+ struct mtk_vpu *vpu = platform_get_drvdata(pdev);
+ int ret;
+
+ ret = clk_prepare_enable(vpu->clk);
+ if (ret)
+ return ret;
+ /* Enable VPU watchdog */
+ vpu_cfg_writel(vpu, vpu_cfg_readl(vpu, VPU_WDT_REG) | (1L << 31),
+ VPU_WDT_REG);

As above.


+
+ return ret;
+}
+
+int vpu_ipi_register(struct platform_device *pdev,
+ enum ipi_id id, ipi_handler_t handler,
+ const char *name, void *priv)
+{
+ struct mtk_vpu *vpu = platform_get_drvdata(pdev);
+ struct vpu_ipi_desc *ipi_desc;
+
+ if (!vpu) {
+ dev_err(&pdev->dev, "vpu device in not ready\n");
+ return -EPROBE_DEFER;
+ }
+
+ if (id < IPI_MAX && handler != NULL) {
+ ipi_desc = vpu->ipi_desc;
+ ipi_desc[id].name = name;
+ ipi_desc[id].handler = handler;
+ ipi_desc[id].priv = priv;
+ return 0;
+ }
+
+ dev_err(&pdev->dev, "register vpu ipi with invalid arguments\n");
+ return -EINVAL;
+}

This API, and manu of its friends lower down the file, appear to be a way to send 32 byte messages to different endpoints on another processor on the SoC.

Just interested to know if you evaluated the mailbox driver infrastructure for this? Its a good fit for the messaging but doesn't offer any support for the direct access to TCM or extended memory.


+int vpu_ipi_send(struct platform_device *pdev,
+ enum ipi_id id, void *buf,
+ unsigned int len, unsigned int wait)
+{
+ struct mtk_vpu *vpu = platform_get_drvdata(pdev);
+ struct share_obj *send_obj = vpu->send_buf;
+ unsigned long timeout;
+
+ if (id >= IPI_MAX || len > sizeof(send_obj->share_buf) || buf == NULL) {
+ dev_err(vpu->dev, "failed to send ipi message\n");
+ return -EINVAL;
+ }
+
+ if (!vpu_running(vpu)) {
+ dev_err(vpu->dev, "vpu_ipi_send: VPU is not running\n");
+ return -ENXIO;
+ }
+
+ mutex_lock(&vpu->vpu_mutex);
+ if (vpu_cfg_readl(vpu, HOST_TO_VPU) && !wait) {
+ mutex_unlock(&vpu->vpu_mutex);
+ return -EBUSY;
+ }

This branch is unreachable (no caller ever sets wait is never set to false).


+
+ if (wait)
+ while (vpu_cfg_readl(vpu, HOST_TO_VPU))
+ ;

What is this loop for? This code should only be reachable if we timed out in the code below so the likely effect of this loop is to permantently wedge a thread in a manner that frustrates signal delivery.


+
+ memcpy((void *)send_obj->share_buf, buf, len);
+ send_obj->len = len;
+ send_obj->id = id;
+ vpu_cfg_writel(vpu, 0x1, HOST_TO_VPU);
+
+ /* Wait until VPU receives the command */
+ timeout = jiffies + msecs_to_jiffies(IPI_TIMEOUT_MS);
+ do {
+ if (time_after(jiffies, timeout)) {
+ dev_err(vpu->dev, "vpu_ipi_send: IPI timeout!\n");
+ return -EIO;
+ }
+ } while (vpu_cfg_readl(vpu, HOST_TO_VPU));

Do we need to busy wait every time we communicate with the co-processor? Couldn't we put this wait *before* we write to HOST_TO_VPU above.

That way we only spin when there is a need to.


+
+ mutex_unlock(&vpu->vpu_mutex);
+
+ return 0;
+}
+
+void *vpu_mapping_dm_addr(struct platform_device *pdev,
+ void *dtcm_dmem_addr)


Use a different type: dtcm_dmem_addr is not a pointer on this CPU.

+{
+ struct mtk_vpu *vpu = platform_get_drvdata(pdev);
+ unsigned long p_vpu_dtcm = (unsigned long)VPU_DTCM(vpu);

p_vpu_dtcm *is* a pointer on this CPU. Why cast it so that it isn't.

+ unsigned long ul_dtcm_dmem_addr = (unsigned long)(dtcm_dmem_addr);
+
+ if (dtcm_dmem_addr == NULL ||
+ (ul_dtcm_dmem_addr > (VPU_DTCM_SIZE + VPU_EXT_D_SIZE))) {
+ dev_err(vpu->dev, "invalid virtual data memory address\n");
+ return ERR_PTR(-EINVAL);
+ }
+
+ if (ul_dtcm_dmem_addr < VPU_DTCM_SIZE)
+ return (void *)(ul_dtcm_dmem_addr + p_vpu_dtcm);

Starting with the pointer will preserve type better:

return vpu->reg.sram + VPU_DTCM_OFFSET + ul_dtcm_dmem_addr;

+
+ return (void *)((ul_dtcm_dmem_addr - VPU_DTCM_SIZE) +
+ VPU_DMEM0_VIRT(vpu));

Likewise this code is clearer with the pointer first.

return vpu->mem.d_va + (ul_dtcm_dmem_addr - VPU_DTCM_SIZE);


+}
+
+dma_addr_t *vpu_mapping_iommu_dm_addr(struct platform_device *pdev,
+ void *dmem_addr)

This function does not return a pointer to a dma_addr_t. It just returns a regular dma_addr_t (that's why all the callers of this function cast it back ;-) ).


dmem_addr is also not a pointer and this function does not return a pointer to a dma_addr_t. It just returns a regular dma_addr_t.

+{
+ unsigned long ul_dmem_addr = (unsigned long)(dmem_addr);
+ struct mtk_vpu *vpu = platform_get_drvdata(pdev);
+
+ if (dmem_addr == NULL ||
+ (ul_dmem_addr < VPU_DTCM_SIZE) ||
+ (ul_dmem_addr > (VPU_DTCM_SIZE + VPU_EXT_D_SIZE))) {
+ dev_err(vpu->dev, "invalid IOMMU data memory address\n");
+ return ERR_PTR(-EINVAL);
+ }
+
+ return (dma_addr_t *)((ul_dmem_addr - VPU_DTCM_SIZE) +
+ VPU_DMEM0_IOVA(vpu));

Better written as (this would also have made explicit the type problem with the function:

return vpu->mem.d_iova + (ul_dmem_addr - VPU_DTCM_SIZE);

+}
+
+struct platform_device *vpu_get_plat_device(struct platform_device *pdev)
+{
+ struct device *dev = &pdev->dev;
+ struct device_node *vpu_node;
+ struct platform_device *vpu_pdev;
+
+ vpu_node = of_parse_phandle(dev->of_node, "vpu", 0);
+ if (!vpu_node) {
+ dev_err(dev, "can't get vpu node\n");
+ return NULL;
+ }
+
+ vpu_pdev = of_find_device_by_node(vpu_node);
+ if (WARN_ON(!vpu_pdev)) {
+ dev_err(dev, "vpu pdev failed\n");
+ of_node_put(vpu_node);
+ return NULL;
+ }
+
+ return vpu_pdev;
+}

This function looks a bit weird to me. Why do we need to keep consulting the devicetree every time we want to start/stop the clock?

I would have expected this code to be part of the init routine of anything that needs this and the platform_device would then be cached.


+
+/* load vpu program/data memory */
+static void load_requested_vpu(struct mtk_vpu *vpu,
+ size_t fw_size,
+ const u8 *fw_data,
+ u8 fw_type)
+{
+ size_t target_size = fw_type ? VPU_DTCM_SIZE : VPU_PTCM_SIZE;
+ size_t extra_fw_size = 0;
+ void *dest;
+
+ /* reset VPU */
+ vpu_cfg_writel(vpu, 0x0, VPU_BASE);
+
+ /* handle extended firmware size */
+ if (fw_size > target_size) {
+ dev_dbg(vpu->dev, "fw size %lx > limited fw size %lx\n",
+ fw_size, target_size);
+ extra_fw_size = fw_size - target_size;
+ dev_dbg(vpu->dev, "extra_fw_size %lx\n", extra_fw_size);
+ fw_size = target_size;
+ }
+ dest = fw_type ? VPU_DTCM(vpu) : VPU_PTCM(vpu);
+ memcpy(dest, fw_data, fw_size);
+ /* download to extended memory if need */
+ if (extra_fw_size > 0) {
+ dest = fw_type ?
+ VPU_DMEM0_VIRT(vpu) : VPU_PMEM0_VIRT(vpu);
+
+ dev_dbg(vpu->dev, "download extended memory type %x\n",
+ fw_type);
+ memcpy(dest, fw_data + target_size, extra_fw_size);
+ }
+}
+
+int vpu_load_firmware(struct platform_device *pdev)
+{
+ struct mtk_vpu *vpu = platform_get_drvdata(pdev);
+ struct device *dev = &pdev->dev;
+ struct vpu_run *run = &vpu->run;
+ const struct firmware *vpu_fw;
+ int ret;
+
+ if (!pdev) {
+ dev_err(dev, "VPU platform device is invalid\n");
+ return -EINVAL;
+ }
+
+ mutex_lock(&vpu->vpu_mutex);
+
+ ret = vpu_enable_clock(pdev);
+ if (ret) {
+ dev_err(dev, "enable clock failed %d\n", ret);
+ goto OUT_LOAD_FW;
+ }
+
+ if (vpu_running(vpu)) {
+ vpu_disable_clock(pdev);
+ mutex_unlock(&vpu->vpu_mutex);
+ dev_warn(dev, "vpu is running already\n");
+ return 0;
+ }
+
+ run->signaled = false;
+ dev_dbg(vpu->dev, "firmware request\n");
+ ret = request_firmware(&vpu_fw, VPU_P_FW, dev);
+ if (ret < 0) {
+ dev_err(dev, "Failed to load %s, %d\n", VPU_P_FW, ret);
+ goto OUT_LOAD_FW;
+ }
+ if (vpu_fw->size > VPU_P_FW_SIZE) {
+ ret = -EFBIG;
+ dev_err(dev, "program fw size %zu is abnormal\n", vpu_fw->size);
+ goto OUT_LOAD_FW;
+ }

Possibly move request_firmware(), release_firmware() and the associated error handling into load_requested_vpu(). It can all be parameterized and the filename of the firmware means error reports will still be clear about whether the p or d firmware is faulty.


+ dev_dbg(vpu->dev, "Downloaded program fw size: %zu.\n",
+ vpu_fw->size);
+ /* Downloading program firmware to device*/
+ load_requested_vpu(vpu, vpu_fw->size, vpu_fw->data,
+ P_FW);
+ release_firmware(vpu_fw);
+
+ ret = request_firmware(&vpu_fw, VPU_D_FW, dev);
+ if (ret < 0) {
+ dev_err(dev, "Failed to load %s, %d\n", VPU_D_FW, ret);
+ goto OUT_LOAD_FW;
+ }
+ if (vpu_fw->size > VPU_D_FW_SIZE) {
+ ret = -EFBIG;
+ dev_err(dev, "data fw size %zu is abnormal\n", vpu_fw->size);
+ goto OUT_LOAD_FW;
+ }
+ dev_dbg(vpu->dev, "Downloaded data fw size: %zu.\n",
+ vpu_fw->size);
+ /* Downloading data firmware to device */
+ load_requested_vpu(vpu, vpu_fw->size, vpu_fw->data,
+ D_FW);
+ release_firmware(vpu_fw);
+ /* boot up vpu */
+ vpu_cfg_writel(vpu, 0x1, VPU_BASE);
+
+ ret = wait_event_interruptible_timeout(run->wq,
+ run->signaled,
+ msecs_to_jiffies(INIT_TIMEOUT_MS)
+ );
+ if (0 == ret) {
+ ret = -ETIME;
+ dev_err(dev, "wait vpu initialization timout!\n");
+ goto OUT_LOAD_FW;
+ } else if (-ERESTARTSYS == ret) {
+ dev_err(dev, "wait vpu interrupted by a signal!\n");
+ goto OUT_LOAD_FW;
+ }
+
+ ret = 0;
+ dev_info(dev, "vpu is ready. Fw version %s\n", run->fw_ver);
+
+OUT_LOAD_FW:
+ vpu_disable_clock(pdev);
+ mutex_unlock(&vpu->vpu_mutex);
+
+ return ret;
+}
+
+static void vpu_init_ipi_handler(void *data, unsigned int len, void *priv)
+{
+ struct mtk_vpu *vpu = (struct mtk_vpu *)priv;
+ struct vpu_run *run = (struct vpu_run *)data;
+
+ vpu->run.signaled = run->signaled;
+ strncpy(vpu->run.fw_ver, run->fw_ver, VPU_FW_VER_LEN);
+ wake_up_interruptible(&vpu->run.wq);
+}
+
+static int vpu_debug_open(struct inode *inode, struct file *file)
+{
+ file->private_data = inode->i_private;
+ return 0;
+}
+
+static ssize_t vpu_debug_read(struct file *file, char __user *user_buf,
+ size_t count, loff_t *ppos)
+{
+ char buf[256];
+ unsigned int len;
+ unsigned int running, pc, vpu_to_host, host_to_vpu, wdt;
+ int ret;
+ struct device *dev = file->private_data;
+ struct platform_device *pdev = to_platform_device(dev);
+ struct mtk_vpu *vpu = dev_get_drvdata(dev);
+
+ ret = vpu_enable_clock(pdev);
+ if (ret) {
+ dev_err(vpu->dev, "[VPU] enable clock failed %d\n", ret);
+ return 0;
+ }
+
+ /* vpu register status */
+ running = vpu_running(vpu);
+ pc = vpu_cfg_readl(vpu, VPU_PC_REG);
+ wdt = vpu_cfg_readl(vpu, VPU_WDT_REG);
+ host_to_vpu = vpu_cfg_readl(vpu, HOST_TO_VPU);
+ vpu_to_host = vpu_cfg_readl(vpu, VPU_TO_HOST);
+ vpu_disable_clock(pdev);
+
+ if (running) {
+ len = sprintf(buf, "VPU is running\n\n"
+ "FW Version: %s\n"
+ "PC: 0x%x\n"
+ "WDT: 0x%x\n"
+ "Host to VPU: 0x%x\n"
+ "VPU to Host: 0x%x\n",
+ vpu->run.fw_ver, pc, wdt,
+ host_to_vpu, vpu_to_host);
+ } else {
+ len = sprintf(buf, "VPU not running\n");
+ }
+
+ return simple_read_from_buffer(user_buf, count, ppos, buf, len);
+}
+
+static const struct file_operations vpu_debug_fops = {
+ .open = vpu_debug_open,
+ .read = vpu_debug_read,
+};

These operations should be conditional on CONFIG_DEBUG_FS.

+
+static void vpu_free_p_ext_mem(struct mtk_vpu *vpu)
+{
+ struct device *dev = vpu->dev;
+ struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+ dma_free_coherent(dev, VPU_EXT_P_SIZE, VPU_PMEM0_VIRT(vpu),
+ VPU_PMEM0_IOVA(vpu));
+
+ if (domain)
+ iommu_detach_device(domain, vpu->dev);
+}
+
+static void vpu_free_d_ext_mem(struct mtk_vpu *vpu)
+{
+ struct device *dev = vpu->dev;
+ struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+ dma_free_coherent(dev, VPU_EXT_D_SIZE, VPU_DMEM0_VIRT(vpu),
+ VPU_DMEM0_IOVA(vpu));
+
+ if (domain)
+ iommu_detach_device(domain, dev);
+}

Look like this could be parameterized and combined with vpu_free_p_ext_mem.


+
+static int vpu_alloc_p_ext_mem(struct mtk_vpu *vpu)
+{
+ struct device *dev = vpu->dev;
+ struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+ phys_addr_t p_pa;
+
+ VPU_PMEM0_VIRT(vpu) = dma_alloc_coherent(dev,
+ VPU_EXT_P_SIZE,
+ &(VPU_PMEM0_IOVA(vpu)),
+ GFP_KERNEL);
+ if (VPU_PMEM0_VIRT(vpu) == NULL) {
+ dev_err(dev, "Failed to allocate the extended program memory\n");
+ return PTR_ERR(VPU_PMEM0_VIRT(vpu));
+ }
+
+ p_pa = iommu_iova_to_phys(domain, vpu->mem.p_iova);
+ /* Disable extend0. Enable extend1 */
+ vpu_cfg_writel(vpu, 0x1, VPU_PMEM_EXT0_ADDR);
+ vpu_cfg_writel(vpu, (p_pa & 0xFFFFF000), VPU_PMEM_EXT1_ADDR);
+
+ dev_info(dev, "Program extend memory phy=0x%llx virt=0x%p iova=0x%llx\n",
+ (unsigned long long)p_pa,
+ VPU_PMEM0_VIRT(vpu),
+ (unsigned long long)VPU_PMEM0_IOVA(vpu));
+
+ return 0;
+}
+
+static int vpu_alloc_d_ext_mem(struct mtk_vpu *vpu)
+{
+ struct device *dev = vpu->dev;
+ struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+ phys_addr_t d_pa;
+
+ VPU_DMEM0_VIRT(vpu) = dma_alloc_coherent(dev,
+ VPU_EXT_D_SIZE,
+ &(VPU_DMEM0_IOVA(vpu)),
+ GFP_KERNEL);
+ if (VPU_DMEM0_VIRT(vpu) == NULL) {
+ dev_err(dev, "Failed to allocate the extended data memory\n");
+ return PTR_ERR(VPU_DMEM0_VIRT(vpu));
+ }
+
+ d_pa = iommu_iova_to_phys(domain, vpu->mem.d_iova);
+
+ /* Disable extend0. Enable extend1 */
+ vpu_cfg_writel(vpu, 0x1, VPU_DMEM_EXT0_ADDR);
+ vpu_cfg_writel(vpu, (d_pa & 0xFFFFF000),
+ VPU_DMEM_EXT1_ADDR);
+
+ dev_info(dev, "Data extend memory phy=0x%llx virt=0x%p iova=0x%llx\n",
+ (unsigned long long)d_pa,
+ VPU_DMEM0_VIRT(vpu),
+ (unsigned long long)VPU_DMEM0_IOVA(vpu));
+
+ return 0;
+}

Also looks suitable for parameterizing and combining with vpu_alloc_p_ext_mem .


+
+static void vpu_ipi_handler(struct mtk_vpu *vpu)
+{
+ struct share_obj *rcv_obj = vpu->recv_buf;
+ struct vpu_ipi_desc *ipi_desc = vpu->ipi_desc;
+
+ if (rcv_obj->id < IPI_MAX && ipi_desc[rcv_obj->id].handler) {
+ ipi_desc[rcv_obj->id].handler(rcv_obj->share_buf,
+ rcv_obj->len,
+ ipi_desc[rcv_obj->id].priv);
+ } else {
+ dev_err(vpu->dev, "No such ipi id = %d\n", rcv_obj->id);
+ }
+}
+
+static int vpu_ipi_init(struct mtk_vpu *vpu)
+{
+ /* Disable VPU to host interrupt */
+ vpu_cfg_writel(vpu, 0x0, VPU_TO_HOST);
+
+ /* shared buffer initialization */
+ vpu->recv_buf = (struct share_obj *)VPU_DTCM(vpu);
+ vpu->send_buf = vpu->recv_buf + 1;
+ memset(vpu->recv_buf, 0, sizeof(struct share_obj));
+ memset(vpu->send_buf, 0, sizeof(struct share_obj));
+ mutex_init(&vpu->vpu_mutex);
+
+ return 0;
+}
+
+static irqreturn_t vpu_irq_handler(int irq, void *priv)
+{
+ struct mtk_vpu *vpu = priv;
+ uint32_t vpu_to_host = vpu_cfg_readl(vpu, VPU_TO_HOST);
+
+ if (vpu_to_host & VPU_IPC_INT)
+ vpu_ipi_handler(vpu);
+ else
+ dev_err(vpu->dev, "vpu watchdog timeout!\n");
+
+ /* VPU won't send another interrupt until we set VPU_TO_HOST to 0. */
+ vpu_cfg_writel(vpu, 0x0, VPU_TO_HOST);

If we were triggered by a watchdog then how long will it be before the next watchdog interrupt? Will we end up spamming the logs?
+
+ return IRQ_HANDLED;
+}
+
+static struct dentry *vpu_debugfs;
+static int mtk_vpu_probe(struct platform_device *pdev)
+{
+ struct mtk_vpu *vpu;
+ struct device *dev;
+ struct resource *res;
+ int ret = 0;
+
+ dev_dbg(&pdev->dev, "initialization\n");
+
+ dev = &pdev->dev;
+ vpu = devm_kzalloc(dev, sizeof(*vpu), GFP_KERNEL);
+ if (!vpu)
+ return -ENOMEM;
+
+ vpu->dev = &pdev->dev;
+ res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "sram");
+ vpu->reg.sram = devm_ioremap_resource(dev, res);
+ if (IS_ERR(vpu->reg.sram)) {
+ dev_err(dev, "devm_ioremap_resource vpu sram failed.\n");
+ return PTR_ERR(vpu->reg.sram);
+ }
+
+ res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "cfg_reg");
+ vpu->reg.cfg = devm_ioremap_resource(dev, res);
+ if (IS_ERR(vpu->reg.cfg)) {
+ dev_err(dev, "devm_ioremap_resource vpu cfg failed.\n");
+ return PTR_ERR(vpu->reg.cfg);
+ }
+
+ /* Get VPU clock */
+ vpu->clk = devm_clk_get(dev, "main");
+ if (vpu->clk == NULL) {
+ dev_err(dev, "get vpu clock fail\n");
+ return -EINVAL;
+ }
+
+ platform_set_drvdata(pdev, vpu);
+
+ ret = vpu_enable_clock(pdev);
+ if (ret) {
+ ret = -EINVAL;
+ return ret;

Why not just "return ret" (or "return -EINVAL")?

+ }
+
+ dev_dbg(dev, "vpu ipi init\n");
+ ret = vpu_ipi_init(vpu);
+ if (ret) {
+ dev_err(dev, "Failed to init ipi\n");
+ goto disable_vpu_clk;
+ }
+
+ platform_set_drvdata(pdev, vpu);
+
+ /* register vpu initialization IPI */
+ ret = vpu_ipi_register(pdev, IPI_VPU_INIT, vpu_init_ipi_handler,
+ "vpu_init", vpu);
+ if (ret) {
+ dev_err(dev, "Failed to register IPI_VPU_INIT\n");
+ goto vpu_mutex_destroy;
+ }
+
+ vpu_debugfs = debugfs_create_file("mtk_vpu", S_IRUGO, NULL, (void *)dev,
+ &vpu_debug_fops);
+ if (!vpu_debugfs) {
+ ret = -ENOMEM;
+ goto cleanup_ipi;
+ }
+
+ /* Set PTCM to 96K and DTCM to 32K */
+ vpu_cfg_writel(vpu, 0x2, VPU_TCM_CFG);
+
+ ret = vpu_alloc_p_ext_mem(vpu);
+ if (ret) {
+ dev_err(dev, "Allocate PM failed\n");
+ goto remove_debugfs;
+ }
+
+ ret = vpu_alloc_d_ext_mem(vpu);
+ if (ret) {
+ dev_err(dev, "Allocate DM failed\n");
+ goto free_p_mem;
+ }
+
+ init_waitqueue_head(&vpu->run.wq);
+
+ res = platform_get_resource(pdev, IORESOURCE_IRQ, 0);
+ if (res == NULL) {
+ dev_err(dev, "get IRQ resource failed.\n");
+ ret = -ENXIO;
+ goto free_d_mem;
+ }
+ vpu->reg.irq = platform_get_irq(pdev, 0);
+ ret = devm_request_irq(dev, vpu->reg.irq, vpu_irq_handler, 0,
+ pdev->name, vpu);
+ if (ret) {
+ dev_err(dev, "failed to request irq\n");
+ goto free_d_mem;
+ }
+
+ vpu_disable_clock(pdev);
+ dev_dbg(dev, "initialization completed\n");
+
+ return 0;
+
+free_d_mem:
+ vpu_free_d_ext_mem(vpu);
+free_p_mem:
+ vpu_free_p_ext_mem(vpu);
+remove_debugfs:
+ debugfs_remove(vpu_debugfs);
+cleanup_ipi:
+ memset(vpu->ipi_desc, 0, sizeof(struct vpu_ipi_desc)*IPI_MAX);
+vpu_mutex_destroy:
+ mutex_destroy(&vpu->vpu_mutex);
+disable_vpu_clk:
+ vpu_disable_clock(pdev);
+
+ return ret;
+}
+
+static const struct of_device_id mtk_vpu_match[] = {
+ {
+ .compatible = "mediatek,mt8173-vpu",
+ },
+ {},
+};
+MODULE_DEVICE_TABLE(of, mtk_vpu_match);
+
+static int mtk_vpu_remove(struct platform_device *pdev)
+{
+ struct mtk_vpu *vpu = platform_get_drvdata(pdev);
+
+ vpu_free_p_ext_mem(vpu);
+ vpu_free_d_ext_mem(vpu);

This looks like it leaks cpu_debugfs and the vpu_mutex.

+
+ return 0;
+}
+
+static struct platform_driver mtk_vpu_driver = {
+ .probe = mtk_vpu_probe,
+ .remove = mtk_vpu_remove,
+ .driver = {
+ .name = MTK_VPU_DRV_NAME,
+ .owner = THIS_MODULE,
+ .of_match_table = mtk_vpu_match,
+ },
+};
+
+module_platform_driver(mtk_vpu_driver);
+
+MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("Mediatek Video Prosessor Unit driver");
diff --git a/drivers/media/platform/mtk-vpu/mtk_vpu_core.h b/drivers/media/platform/mtk-vpu/mtk_vpu_core.h
new file mode 100644
index 0000000..20cf2a0
--- /dev/null
+++ b/drivers/media/platform/mtk-vpu/mtk_vpu_core.h
@@ -0,0 +1,161 @@
+/*
+* Copyright (c) 2015 MediaTek Inc.
+* Author: Andrew-CT Chen <andrew-ct.chen@xxxxxxxxxxxx>
+*
+* This program is free software; you can redistribute it and/or modify
+* it under the terms of the GNU General Public License version 2 as
+* published by the Free Software Foundation.
+*
+* This program is distributed in the hope that it will be useful,
+* but WITHOUT ANY WARRANTY; without even the implied warranty of
+* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+* GNU General Public License for more details.
+*/
+
+#ifndef _MTK_VPU_CORE_H
+#define _MTK_VPU_CORE_H
+
+#include <linux/platform_device.h>
+
+/**
+ * VPU (video processor unit) is a tiny processor controlling video hardware
+ * related to video codec, scaling and color format converting.
+ * VPU interfaces with other blocks by share memory and interrupt.
+ **/
+
+typedef void (*ipi_handler_t) (void *data,
+ unsigned int len,
+ void *priv);
+
+/**
+ * enum ipi_id - the id of inter-processor interrupt
+ *
+ * @IPI_VPU_INIT: The interrupt from vpu is to notfiy kernel
+ VPU initialization completed.
+ * @IPI_VENC_H264: The interrupt from vpu is to notify kernel to
+ handle H264 video encoder job, and vice versa.
+ * @IPI_VENC_VP8: The interrupt fro vpu is to notify kernel to
+ handle VP8 video encoder job,, and vice versa.
+ * @IPI_VENC_CAPABILITY: The interrupt from vpu is to
+ get venc hardware capability.
+ * @IPI_MAX: The maximum IPI number
+ */
+enum ipi_id {
+ IPI_VPU_INIT = 0,
+ IPI_VENC_H264,
+ IPI_VENC_VP8,
+ IPI_VENC_CAPABILITY,
+ IPI_MAX,
+};
+
+/**
+ * vpu_disable_clock - Disable VPU clock
+ *
+ * @pdev: VPU platform device
+ *
+ *
+ * Return: Return 0 if the clock is disabled successfully,
+ * otherwise it is failed.
+ *
+ **/
+void vpu_disable_clock(struct platform_device *pdev);
+
+/**
+ * vpu_enable_clock - Enable VPU clock
+ *
+ * @pdev: VPU platform device
+ *
+ * Return: Return 0 if the clock is enabled successfully,
+ * otherwise it is failed.
+ *
+ **/
+int vpu_enable_clock(struct platform_device *pdev);
+
+/**
+ * vpu_ipi_register - register an ipi function
+ *
+ * @pdev: VPU platform device
+ * @id: IPI ID
+ * @handler: IPI handler
+ * @name: IPI name
+ * @priv: private data for IPI handler
+ *
+ * Register an ipi function to receive ipi interrupt from VPU.
+ *
+ * Return: Return 0 if ipi registers successfully, otherwise it is failed.
+ */
+int vpu_ipi_register(struct platform_device *pdev, enum ipi_id id,
+ ipi_handler_t handler, const char *name, void *priv);
+
+/**
+ * vpu_ipi_send - send data from AP to vpu.
+ *
+ * @pdev: VPU platform device
+ * @id: IPI ID
+ * @buf: the data buffer
+ * @len: the data buffer length
+ * @wait: wait for the last ipi completed.
+ *
+ * This function is thread-safe. When this function returns,
+ * VPU has received the data and starts the processing.
+ * When the processing completes, IPI handler registered
+ * by vpu_ipi_register will be called in interrupt context.
+ *
+ * Return: Return 0 if sending data successfully, otherwise it is failed.
+ **/
+int vpu_ipi_send(struct platform_device *pdev,
+ enum ipi_id id, void *buf,
+ unsigned int len,
+ unsigned int wait);
+
+/**
+ * vpu_get_plat_device - get VPU's platform device
+ *
+ * @pdev: the platform device of the module requesting VPU platform
+ * device for using VPU API.
+ *
+ * Return: Return NULL if it is failed.
+ * otherwise it is VPU's platform device
+ **/
+struct platform_device *vpu_get_plat_device(struct platform_device *pdev);
+
+/**
+ * vpu_load_firmware - download VPU firmware and boot it
+ *
+ * @pdev: VPU platform device
+ *
+ * Return: Return 0 if downloading firmware successfully,
+ * otherwise it is failed
+ **/
+int vpu_load_firmware(struct platform_device *pdev);
+
+/**
+ * vpu_mapping_dm_addr - Mapping DTCM/DMEM to kernel virtual address
+ *
+ * @pdev: VPU platform device
+ * @dmem_addr: VPU's data memory address
+ *
+ * Mapping the VPU's DTCM (Data Tightly-Coupled Memory) /
+ * DMEM (Data Extended Memory) memory address to
+ * kernel virtual address.
+ *
+ * Return: Return ERR_PTR(-EINVAL) if mapping failed,
+ * otherwise the mapped kernel virtual address
+ **/
+void *vpu_mapping_dm_addr(struct platform_device *pdev,
+ void *dtcm_dmem_addr);
+
+/**
+ * vpu_mapping_iommu_dm_addr - Mapping to iommu address
+ *
+ * @pdev: VPU platform device
+ * @dmem_addr: VPU's extended data memory address
+ *
+ * Mapping the VPU's extended data address to iommu address
+ *
+ * Return: Return ERR_PTR(-EINVAL) if mapping failed,
+ * otherwise the mapped iommu address
+ **/
+dma_addr_t *vpu_mapping_iommu_dm_addr(struct platform_device *pdev,
+ void *dmem_addr);
+#endif /* _MTK_VPU_CORE_H */
diff --git a/drivers/media/platform/mtk-vpu/vp8_enc/venc_vp8_vpu.h b/drivers/media/platform/mtk-vpu/vp8_enc/venc_vp8_vpu.h
new file mode 100644
index 0000000..4e09eec
--- /dev/null
+++ b/drivers/media/platform/mtk-vpu/vp8_enc/venc_vp8_vpu.h

Like the H.264 header. Why is this file included in this patch? It is not included by anything in the patch and defines symbols that do not exist yet.


Daniel.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/