Re: [PATCH] media: venus: use contig vb2 ops

From: Robin Murphy
Date: Tue Dec 15 2020 - 08:33:24 EST


On 2020-12-15 11:47, Tomasz Figa wrote:
On Tue, Dec 15, 2020 at 8:16 PM Stanimir Varbanov
<stanimir.varbanov@xxxxxxxxxx> wrote:

Hi,

Cc: Robin

On 12/14/20 2:57 PM, Alexandre Courbot wrote:
This driver uses the SG vb2 ops, but effectively only ever accesses the
first entry of the SG table, indicating that it expects a flat layout.
Switch it to use the contiguous ops to make sure this expected invariant

Under what circumstances the sg table will has nents > 1? I came down to
[1] but not sure I got it right.

I'm afraid that for systems with low amount of system memory and when
the memory become fragmented, the driver will not work. That's why I
started with sg allocator.

It is exactly the opposite. The vb2-dma-contig allocator is "contig"
in terms of the DMA (aka IOVA) address space. In other words, it
guarantees that having one DMA address and length fully describes the
buffer. This seems to be the requirement of the hardware/firmware
handled by the venus driver. If the device is behind an IOMMU, which
is the case for the SoCs in question, the underlying DMA ops will
actually allocate a discontiguous set of pages, so it has nothing to
do to system memory amount or fragmentation. If for some reason the
IOMMU can't be used, there is no way around, the memory needs to be
contiguous because of the hardware/firmware/driver expectation.

On the other hand, the vb2-dma-sg allocator doesn't have any
continuity guarantees for the DMA, or any other, address space.

Yes, intuitively one would assume that the sg code was for devices with native scatter-gather capability that can deal with an actual set of buffer descriptors, rather than just a single pointer (which is the original purpose of scatterlists, after all). I've always been slightly puzzled why the two seem to be quite so similar.

The
current code works fine, because it calls dma_map_sg() on the whole
set of pages and that ends up mapping it contiguously in the IOVA
space, but that's just an implementation detail, not an API guarantee.

Oh, the fun we've had over that implementation detail! :P

Robin.

Best regards,
Tomasz


[1]
https://elixir.bootlin.com/linux/v5.10.1/source/drivers/iommu/dma-iommu.c#L782

is always enforced. Since the device is supposed to be behind an IOMMU
this should have little to none practical consequences beyond making the
driver not rely on a particular behavior of the SG implementation.

Reported-by: Tomasz Figa <tfiga@xxxxxxxxxxxx>
Signed-off-by: Alexandre Courbot <acourbot@xxxxxxxxxxxx>
---
Hi everyone,

It probably doesn't hurt to fix this issue before some actual issue happens.
I have tested this patch on Chrome OS and playback was just as fine as with
the SG ops.

drivers/media/platform/Kconfig | 2 +-
drivers/media/platform/qcom/venus/helpers.c | 9 ++-------
drivers/media/platform/qcom/venus/vdec.c | 6 +++---
drivers/media/platform/qcom/venus/venc.c | 6 +++---
4 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/drivers/media/platform/Kconfig b/drivers/media/platform/Kconfig
index 35a18d388f3f..d9d7954111f2 100644
--- a/drivers/media/platform/Kconfig
+++ b/drivers/media/platform/Kconfig
@@ -533,7 +533,7 @@ config VIDEO_QCOM_VENUS
depends on INTERCONNECT || !INTERCONNECT
select QCOM_MDT_LOADER if ARCH_QCOM
select QCOM_SCM if ARCH_QCOM
- select VIDEOBUF2_DMA_SG
+ select VIDEOBUF2_DMA_CONTIG
select V4L2_MEM2MEM_DEV
help
This is a V4L2 driver for Qualcomm Venus video accelerator
diff --git a/drivers/media/platform/qcom/venus/helpers.c b/drivers/media/platform/qcom/venus/helpers.c
index 50439eb1ffea..859d260f002b 100644
--- a/drivers/media/platform/qcom/venus/helpers.c
+++ b/drivers/media/platform/qcom/venus/helpers.c
@@ -7,7 +7,7 @@
#include <linux/mutex.h>
#include <linux/slab.h>
#include <linux/kernel.h>
-#include <media/videobuf2-dma-sg.h>
+#include <media/videobuf2-dma-contig.h>
#include <media/v4l2-mem2mem.h>
#include <asm/div64.h>

@@ -1284,14 +1284,9 @@ int venus_helper_vb2_buf_init(struct vb2_buffer *vb)
struct venus_inst *inst = vb2_get_drv_priv(vb->vb2_queue);
struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb);
struct venus_buffer *buf = to_venus_buffer(vbuf);
- struct sg_table *sgt;
-
- sgt = vb2_dma_sg_plane_desc(vb, 0);
- if (!sgt)
- return -EFAULT;

buf->size = vb2_plane_size(vb, 0);
- buf->dma_addr = sg_dma_address(sgt->sgl);

Can we do it:

if (WARN_ON(sgt->nents > 1))
return -EFAULT;

I understand that logically using dma-sg when the flat layout is
expected by the hardware is wrong, but I haven't seen issues until now.

+ buf->dma_addr = vb2_dma_contig_plane_dma_addr(vb, 0);

if (vb->type == V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE)
list_add_tail(&buf->reg_list, &inst->registeredbufs);
diff --git a/drivers/media/platform/qcom/venus/vdec.c b/drivers/media/platform/qcom/venus/vdec.c
index 8488411204c3..3fb277c81aca 100644
--- a/drivers/media/platform/qcom/venus/vdec.c
+++ b/drivers/media/platform/qcom/venus/vdec.c
@@ -13,7 +13,7 @@
#include <media/v4l2-event.h>
#include <media/v4l2-ctrls.h>
#include <media/v4l2-mem2mem.h>
-#include <media/videobuf2-dma-sg.h>
+#include <media/videobuf2-dma-contig.h>

#include "hfi_venus_io.h"
#include "hfi_parser.h"
@@ -1461,7 +1461,7 @@ static int m2m_queue_init(void *priv, struct vb2_queue *src_vq,
src_vq->io_modes = VB2_MMAP | VB2_DMABUF;
src_vq->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY;
src_vq->ops = &vdec_vb2_ops;
- src_vq->mem_ops = &vb2_dma_sg_memops;
+ src_vq->mem_ops = &vb2_dma_contig_memops;
src_vq->drv_priv = inst;
src_vq->buf_struct_size = sizeof(struct venus_buffer);
src_vq->allow_zero_bytesused = 1;
@@ -1475,7 +1475,7 @@ static int m2m_queue_init(void *priv, struct vb2_queue *src_vq,
dst_vq->io_modes = VB2_MMAP | VB2_DMABUF;
dst_vq->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY;
dst_vq->ops = &vdec_vb2_ops;
- dst_vq->mem_ops = &vb2_dma_sg_memops;
+ dst_vq->mem_ops = &vb2_dma_contig_memops;
dst_vq->drv_priv = inst;
dst_vq->buf_struct_size = sizeof(struct venus_buffer);
dst_vq->allow_zero_bytesused = 1;
diff --git a/drivers/media/platform/qcom/venus/venc.c b/drivers/media/platform/qcom/venus/venc.c
index 1c61602c5de1..a09550cd1dba 100644
--- a/drivers/media/platform/qcom/venus/venc.c
+++ b/drivers/media/platform/qcom/venus/venc.c
@@ -10,7 +10,7 @@
#include <linux/pm_runtime.h>
#include <linux/slab.h>
#include <media/v4l2-mem2mem.h>
-#include <media/videobuf2-dma-sg.h>
+#include <media/videobuf2-dma-contig.h>
#include <media/v4l2-ioctl.h>
#include <media/v4l2-event.h>
#include <media/v4l2-ctrls.h>
@@ -1001,7 +1001,7 @@ static int m2m_queue_init(void *priv, struct vb2_queue *src_vq,
src_vq->io_modes = VB2_MMAP | VB2_USERPTR | VB2_DMABUF;
src_vq->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY;
src_vq->ops = &venc_vb2_ops;
- src_vq->mem_ops = &vb2_dma_sg_memops;
+ src_vq->mem_ops = &vb2_dma_contig_memops;
src_vq->drv_priv = inst;
src_vq->buf_struct_size = sizeof(struct venus_buffer);
src_vq->allow_zero_bytesused = 1;
@@ -1017,7 +1017,7 @@ static int m2m_queue_init(void *priv, struct vb2_queue *src_vq,
dst_vq->io_modes = VB2_MMAP | VB2_USERPTR | VB2_DMABUF;
dst_vq->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY;
dst_vq->ops = &venc_vb2_ops;
- dst_vq->mem_ops = &vb2_dma_sg_memops;
+ dst_vq->mem_ops = &vb2_dma_contig_memops;
dst_vq->drv_priv = inst;
dst_vq->buf_struct_size = sizeof(struct venus_buffer);
dst_vq->allow_zero_bytesused = 1;


--
regards,
Stan