Re: [PATCH v4 1/2] media: videobuf2: Fix dmabuf cache sync/flush in dma-contig

From: Mikhail Rudenko
Date: Mon Mar 10 2025 - 04:52:15 EST



Hi Nicolas, Tomasz,

On 2025-03-03 at 10:24 -05, Nicolas Dufresne <nicolas@xxxxxxxxxxxx> wrote:

> Hi Mikhail,
>
> Le lundi 03 mars 2025 à 14:40 +0300, Mikhail Rudenko a écrit :
>> When support for V4L2_FLAG_MEMORY_NON_CONSISTENT was removed in
>> commit 129134e5415d ("media: media/v4l2: remove
>> V4L2_FLAG_MEMORY_NON_CONSISTENT flag"),
>> vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions were made
>> no-ops. Later, when support for V4L2_MEMORY_FLAG_NON_COHERENT was
>> introduced in commit c0acf9cfeee0 ("media: videobuf2: handle
>> V4L2_MEMORY_FLAG_NON_COHERENT flag"), the above functions remained
>> no-ops, making cache maintenance for non-coherent dmabufs allocated
>> by
>> dma-contig impossible.
>>
>> Fix this by reintroducing dma_sync_sgtable_for_{cpu,device} and
>> {flush,invalidate}_kernel_vmap_range calls to
>> vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions for non-coherent
>> buffers.
>>
>> Fixes: c0acf9cfeee0 ("media: videobuf2: handle
>> V4L2_MEMORY_FLAG_NON_COHERENT flag")
>> Cc: stable@xxxxxxxxxxxxxxx
>> Signed-off-by: Mikhail Rudenko <mike.rudenko@xxxxxxxxx>
>> ---
>>  .../media/common/videobuf2/videobuf2-dma-contig.c  | 22
>> ++++++++++++++++++++++
>>  1 file changed, 22 insertions(+)
>>
>> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
>> b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
>> index
>> a13ec569c82f6da2d977222b94af32e74c6c6c82..d41095fe5bd21faf815d6b035d7
>> bc888a84a95d5 100644
>> --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
>> +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
>> @@ -427,6 +427,17 @@ static int
>>  vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf,
>>      enum dma_data_direction
>> direction)
>>  {
>> + struct vb2_dc_buf *buf = dbuf->priv;
>> + struct sg_table *sgt = buf->dma_sgt;
>> +
>> + if (!buf->non_coherent_mem)
>> + return 0;
>> +
>> + if (buf->vaddr)
>> + invalidate_kernel_vmap_range(buf->vaddr, buf->size);
>
> What would make me a lot more confortable with this change is if you
> enable kernel mappings for one test. This will ensure you cover the
> call to "invalidate" in your testing. I'd like to know about the
> performance impact. With this implementation it should be identical to
> the VB2 one.
>

I have re-run my tests on RK3399, with 1280x720 XRGB capture buffers (1
plane, 3686400 bytes). Capture process was pinned to "big" cores running
at fixed frequency of 1.8 GHz. Libcamera was modified to request buffers
with V4L2_MEMORY_FLAG_NON_COHERENT flag. DMA_BUF_IOCTL_SYNC ioctls were
used as appropriate. For kernel mapping effect test, vb2_plane_vaddr
call was inserted into rkisp1_vb2_buf_init.

The timings are as following:

- memcpy coherent buffer: 7570 +/- 63 us
- memcpy non-coherent buffer: 1120 +/- 34 us

without kernel mapping:

- ioctl(fd, DMA_BUF_IOCTL_SYNC, {DMA_BUF_SYNC_START|DMA_BUF_SYNC_READ}): 428 +/- 15 us
- ioctl(fd, DMA_BUF_IOCTL_SYNC, {DMA_BUF_SYNC_END|DMA_BUF_SYNC_READ}): 422 +/- 28 us

with kernel mapping:

- ioctl(fd, DMA_BUF_IOCTL_SYNC, {DMA_BUF_SYNC_START|DMA_BUF_SYNC_READ}): 526 +/- 13 us
- ioctl(fd, DMA_BUF_IOCTL_SYNC, {DMA_BUF_SYNC_END|DMA_BUF_SYNC_READ}): 519 +/- 38 us

So, even with kernel mapping enabled, speedup is 7570 / (1120 + 526 + 519) = 3.5,
and the use of noncoherent buffers is justified -- at least on this platform.

--
Best regards,
Mikhail Rudenko