Re: DMA-buf and uncached system memory

From: Lucas Stach
Date: Mon Feb 15 2021 - 06:55:09 EST


Am Montag, dem 15.02.2021 um 10:34 +0100 schrieb Christian König:
>
> Am 15.02.21 um 10:06 schrieb Simon Ser:
> > On Monday, February 15th, 2021 at 9:58 AM, Christian König <christian.koenig@xxxxxxx> wrote:
> >
> > > we are currently working an Freesync and direct scan out from system
> > > memory on AMD APUs in A+A laptops.
> > >
> > > On problem we stumbled over is that our display hardware needs to scan
> > > out from uncached system memory and we currently don't have a way to
> > > communicate that through DMA-buf.
> > >
> > > For our specific use case at hand we are going to implement something
> > > driver specific, but the question is should we have something more
> > > generic for this?
> > >
> > > After all the system memory access pattern is a PCIe extension and as
> > > such something generic.
> > Intel also needs uncached system memory if I'm not mistaken?
>
> No idea, that's why I'm asking. Could be that this is also interesting
> for I+A systems.
>
> > Where are the buffers allocated? If GBM, then it needs to allocate memory that
> > can be scanned out if the USE_SCANOUT flag is set or if a scanout-capable
> > modifier is picked.
> >
> > If this is about communicating buffer constraints between different components
> > of the stack, there were a few proposals about it. The most recent one is [1].
>
> Well the problem here is on a different level of the stack.
>
> See resolution, pitch etc:.. can easily communicated in userspace
> without involvement of the kernel. The worst thing which can happen is
> that you draw garbage into your own application window.
>
> But if you get the caching attributes in the page tables (both CPU as
> well as IOMMU, device etc...) wrong then ARM for example has the
> tendency to just spontaneously reboot
>
> X86 is fortunately a bit more gracefully and you only end up with random
> data corruption, but that is only marginally better.
>
> So to sum it up that is not something which we can leave in the hands of
> userspace.
>
> I think that exporters in the DMA-buf framework should have the ability
> to tell importers if the system memory snooping is necessary or not.

There is already a coarse-grained way to do so: the dma_coherent
property in struct device, which you can check at dmabuf attach time.

However it may not be enough for the requirements of a GPU where the
engines could differ in their dma coherency requirements. For that you
need to either have fake struct devices for the individual engines or
come up with a more fine-grained way to communicate those requirements.

> Userspace components can then of course tell the exporter what the
> importer needs, but validation if that stuff is correct and doesn't
> crash the system must happen in the kernel.

What exactly do you mean by "scanout requires non-coherent memory"?
Does the scanout requestor always set the no-snoop PCI flag, so you get
garbage if some writes to memory are still stuck in the caches, or is
it some other requirement?

Regards,
Lucas