RE: Question for an accepted patch: use of DMA-BUF based videobuf2 capture buffer with no-HW-cache-coherent HW

From: yuji2.ishikawa
Date: Wed Oct 26 2022 - 05:22:21 EST


Hi Hans,

> -----Original Message-----
> From: Hans Verkuil <hverkuil-cisco@xxxxxxxxx>
> Sent: Monday, October 24, 2022 4:49 PM
> To: ishikawa yuji(石川 悠司 ○RDC□AITC○EA開)
> <yuji2.ishikawa@xxxxxxxxxxxxx>; posciak@xxxxxxxxxxxx;
> paul.kocialkowski@xxxxxxxxxxx; mchehab+samsung@xxxxxxxxxx;
> linux-media@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> Subject: Re: Question for an accepted patch: use of DMA-BUF based videobuf2
> capture buffer with no-HW-cache-coherent HW
>
> Hi Yuji,
>
> On 10/24/22 06:02, yuji2.ishikawa@xxxxxxxxxxxxx wrote:
> > Hi,
> >
> > I'm porting a V4L2 capture driver from 4.19.y to 5.10.y [1].
> >
> > When I test the ported driver, I sometimes find a corruption on a captured
> image.
> >
> > Because the corruption is exactly aligned with cacheline, I started
> investigation from map/unmap of DMA-BUF.
> >
> >
> >
> > The capture driver uses DMA-BUF for videobuf2.
> >
> > The capture hardware does not have HW-mantained cache coherency with
> CPU, that is, explicit map/unmap is essential on QBUF/DQBUF.
> >
> > After some hours of struggle, I found a patch removing cache synchronizations
> on QBUF/DQBUF.
> >
> >
> >
> > https://patchwork.kernel.org/project/linux-media/patch/20190124095156.
> > 21898-1-paul.kocialkowski@xxxxxxxxxxx/
> > <https://patchwork.kernel.org/project/linux-media/patch/20190124095156
> > .21898-1-paul.kocialkowski@xxxxxxxxxxx/>
> >
> >
> >
> > When I removed this patch from my 5.10.y working-tree, the driver
> > yielded images without any defects.v
> >
> >
> >
> > ***************
> >
> > Sorry for a mention to a patch released 4 years ago.
> >
> > The patch removes map/unmap on QBUF/DQBUF to improve the
> performance of V4L2 decoder device, by reusing previously decoded frames.
> >
> > However, there seems no cares nor compensations for modifying lifecycle of
> DMA-BUF, especially on video capture devices.
>
> I'm not entirely sure what you mean exactly.
>
My concern is consistency between ioctls and the state transition of capture buffers.
Generally, streaming I/O (DMA-BUF importing) buffers are handled following by userland.

Ioctl(VIDIOC_QBUF) -> /* DMA transfer from HW*/ -> ioctl(VIDIOC_DQBUF) -> /* access from CPU */ -> ioctl(VIDIOC_QBUF) -> ...

Therefore, expected semantics is that a buffer is owned by HW after QBUF, and owned by CPU after DQBUF.
In practice, ioctl(QBUF) kicks vb2_dc_map_dma_buf() and ioctl(DQBUF) kicks vb2_dc_unmap_dma_buf() before applying the patch.
This implementation keeps consistency in terms of cache coherency as cache-clean is done in vb2_dc_map_dma_buf().

By applying the patch, ioctl(DQBUF) does not kick unmap_dma() anymore. The similar for ioctl(QBUF).
Therefore, in practice, a buffer is not owned by CPU just after ioctl(DQBUF).
To keep compatibility of buffer operations, there should be delayed map_dma()/unmap_dma() call just before DMA-transfer/CPU-access.
However, no one referred to such function in the v4l2 framework in the examination of the patch.
Also, there is no advice for individual video device drivers; such that adding map_dma()/unmap_dma() explicitly.

> >
> >
> >
> > Would you tell me some idea on this patch:
> >
> > * Do well-implemented capture drivers work well even if this patch is applied?
>
> Yes, dmabuf is used extensively and I have not had any reports of issues.

Many architectures can avoid this problem.
A problem statistically occurs, only if a video capture HW does not have HW-maintained cache coherency with CPU.
Does this patch consider such case?

> >
> > * How should a video capture driver call V4L2/videobuf2 APIs, especially
> when the hardware does not support cache coherency?
>
> It should all be handled correctly by the core frameworks.
>
> I think you need to debug more inside videobuf2-core.c. Some printk's that show
> the dmabuf fd when the buffer is mapped and when it is unmapped + the length
> it is mapping should hopefully help a bit.

I added printk and dump_stack() to several functions.
The patched function __prepare_dmabuf() is called every ioctl(QBUF).
Function vb2_dc_map_dmabuf() is called only for the 1st call of ioctl(QBUF) for a buffer instance.
After that, vb2_dc_map_dmabuf() was never called, as the patch intended.

Regards,
Yuji

>
> Regards,
>
> Hans
>
> >
> >
> >
> > ***************
> >
> > [1] FYI: the capture driver is not on mainline yet; the candidate is,
> >
> > https://lore.kernel.org/all/20220810132822.32534-1-yuji2.ishikawa@tosh
> > iba.co.jp/
> > <https://lore.kernel.org/all/20220810132822.32534-1-yuji2.ishikawa@tos
> > hiba.co.jp/>
> >
> >
> >
> >
> >
> > Regards,
> >
> >               Yuji Ishikawa
> >