RE: DMA direct mapping fix for 5.4 and earlier stable branches

From: obayashi.yoshimasa
Date: Tue Feb 09 2021 - 04:19:11 EST


> > As the drivers are currently under development and Socionext has
> > chosen 5.4 stable kernel for their development. So I will let
> > Obayashi-san answer this if it's possible for them to migrate to 5.10
> > instead?

We have started this development project from last August,
so we have selected 5.4 as most recent and longest lifetime LTS
version at that time.

And we have already finished to develop other device drivers,
and Video converter and CODEC drivers are now in development.

> Why pick a kernel that doesn not support the features they require?
> That seems very odd and unwise.

From the view point of ZeroCopy using DMABUF, is 5.4 not
mature enough, and is 5.10 enough mature ?
This is the most important point for judging migration.

Regards.

> -----Original Message-----
> From: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> Sent: Tuesday, February 9, 2021 5:04 PM
> To: Sumit Garg <sumit.garg@xxxxxxxxxx>
> Cc: Obayashi, Yoshimasa/尾林 善正 <obayashi.yoshimasa@xxxxxxxxxxxxx>; hch@xxxxxx;
> m.szyprowski@xxxxxxxxxxx; robin.murphy@xxxxxxx; iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx; Linux Kernel Mailing
> List <linux-kernel@xxxxxxxxxxxxxxx>; stable <stable@xxxxxxxxxxxxxxx>; Daniel Thompson
> <daniel.thompson@xxxxxxxxxx>
> Subject: Re: DMA direct mapping fix for 5.4 and earlier stable branches
>
> On Tue, Feb 09, 2021 at 01:28:47PM +0530, Sumit Garg wrote:
> > Thanks Greg for your response.
> >
> > On Tue, 9 Feb 2021 at 12:28, Greg Kroah-Hartman
> > <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > On Tue, Feb 09, 2021 at 11:39:25AM +0530, Sumit Garg wrote:
> > > > Hi Christoph, Greg,
> > > >
> > > > Currently we are observing an incorrect address translation
> > > > corresponding to DMA direct mapping methods on 5.4 stable kernel
> > > > while sharing dmabuf from one device to another where both devices
> > > > have their own coherent DMA memory pools.
> > >
> > > What devices have this problem?
> >
> > The problem is seen with V4L2 device drivers which are currently under
> > development for UniPhier PXs3 Reference Board from Socionext [1].
>
> Ok, so it's not even a driver in the 5.4 kernel today, so there's nothing I can do here as there is no regression
> of the existing source tree.
>
> > Following is brief description of the test framework:
> >
> > The issue is observed while trying to construct a Gstreamer pipeline
> > leveraging hardware video converter engine (VPE device) and hardware
> > video encode/decode engine (CODEC device) where we use dmabuf
> > framework for Zero-Copy.
> >
> > Example GStreamer pipeline is:
> > gst-launch-1.0 -v -e videotestsrc \
> > > ! video/x-raw, width=480, height=270, format=NV15 \ ! v4l2convert
> > > device=/dev/vpe0 capture-io-mode=dmabuf-import \ ! video/x-raw,
> > > width=480, height=270, format=NV12 \ ! v4l2h265enc
> > > device=/dev/codec0 output-io-mode=dmabuf \ ! video/x-h265,
> > > format=byte-stream, width=480, height=270 \ ! filesink
> > > location=out.hevc
> >
> > Using GStreamer's V4L2 plugin,
> > - v4l2convert controls VPE driver,
> > - v4l2h265enc controls CODEC driver.
> >
> > In the above pipeline, VPE driver imports CODEC driver's DMABUF for Zero-Copy.
> >
> > [1] arch/arm64/boot/dts/socionext/uniphier-pxs3-ref.dts
> >
> > > And why can't then just use 5.10 to
> > > solve this issue as that problem has always been present for them,
> > > right?
> >
> > As the drivers are currently under development and Socionext has
> > chosen 5.4 stable kernel for their development. So I will let
> > Obayashi-san answer this if it's possible for them to migrate to 5.10
> > instead?
>
> Why pick a kernel that doesn not support the features they require?
> That seems very odd and unwise.
>
> > BTW, this problem belongs to the common code, so others may experience
> > this issue as well.
>
> Then they should move to 5.10 or newer as this just doesn't work on older kernels, right?
>
> > > > I am able to root cause this issue which is caused by incorrect
> > > > virt to phys translation for addresses belonging to vmalloc space
> > > > using virt_to_page(). But while looking at the mainline kernel,
> > > > this patch [1] changes address translation from virt->to->phys to
> > > > dma->to->phys which fixes the issue observed on 5.4 stable kernel
> > > > as well (minimal fix [2]).
> > > >
> > > > So I would like to seek your suggestion for backport to stable
> > > > kernels
> > > > (5.4 or earlier) as to whether we should backport the complete
> > > > mainline commit [1] or we should just apply the minimal fix [2]?
> > >
> > > Whenever you try to create a "minimal" fix, 90% of the time it is
> > > wrong and does not work and I end up having to deal with the mess.
> >
> > I agree with your concerns for having to apply a non-mainline commit
> > onto a stable kernel.
> >
> > > What
> > > prevents you from doing the real thing here? Are the patches to big?
> > >
> >
> > IMHO, yes the mainline patch is big enough to touch multiple
> > architectures. But if that's the only way preferred then I can
> > backport the mainline patch instead.
> >
> > > And again, why not just use 5.10 for this hardware? What hardware
> > > is it?
> > >
> >
> > Please see my response above.
>
> If a feature in the kernel was not present on older kernels, trying to shoe-horn it into them is not wise
> at all. You pick a kernel version to reflect the features/options that you require, and it sounds like
> 5.4 just will not work for them, so to stick with that would be quite foolish.
>
> Just move to 5.10, much simpler!
>
> thanks,
>
> greg k-h